Methods for identifying risk of breast cancer and treatments thereof

ABSTRACT

Provided herein are methods for identifying risk of breast cancer in a subject and/or a subject at risk of breast cancer, reagents and kits for carrying out the methods, methods for identifying candidate therapeutics for treating breast cancer, and therapeutic methods for treating breast cancer in a subject. These embodiments are based upon an analysis of polymorphic variations in nucleotide sequences within the human genome.

RELATED PATENT APPLICATIONS

This patent application is a divisional of U.S. patent application Ser.No. 10/723,681, filed Nov. 23, 2003, which claims the benefit ofprovisional patent application No. 60/429,136 filed Nov. 25, 2002 andprovisional patent application no. filed Jul. 24, 2003. Each of theseprovisional patent applications names Richard B. Roth et al. asinventors and is hereby incorporated herein by reference in itsentirety, including all drawings and cited publications and documents.Also incorporated by reference are patent applications filed on Nov. 25,2003, entitled “Methods for identifying risk of breast cancer andtreatments thereof,” naming Richard B. Roth et al. as inventors andbearing attorney docket numbers 524592006600, 524592006700,524592006800, 524592007000, 524592007100 and 524592007200. In addition,incorporated by reference is a patent application naming Matthew R.Nelson as an inventor, entitled “Disease risk prediction with associatedsingle nucleotide polymorphisms,” having attorney docket number524593006400.

FIELD OF THE INVENTION

The invention relates to genetic methods for identifying risk of breastcancer and treatments that specifically target the disease.

BACKGROUND

Breast cancer is the third most common cancer, and the most commoncancer in women, as well as a cause of disability, psychological trauma,and economic loss. Breast cancer is the second most common cause ofcancer death in women in the United States, in particular for womenbetween the ages of 15 and 54, and the leading cause of cancer-relateddeath (Forbes, Seminars in Oncology, vol. 24(1), Suppl 1, 1997: pp.S1-20-S1-35). Indirect effects of the disease also contribute to themortality from breast cancer including consequences of advanced disease,such as metastases to the bone or brain. Complications arising from bonemarrow suppression, radiation fibrosis and neutropenic sepsis,collateral effects from therapeutic interventions, such as surgery,radiation, chemotherapy, or bone marrow transplantation-also contributeto the morbidity and mortality from this disease.

While the pathogenesis of breast cancer is unclear, transformation ofnormal breast epithelium to a malignant phenotype may be the result ofgenetic factors, especially in women under thirty (Miki, et al.,Science, 266: 66-71 (1994)). However, it is likely that other,non-genetic factors also have a significant effect on the etiology ofthe disease. Regardless of its origin, breast cancer morbidity increasessignificantly if it is not detected early in its progression. Thus,considerable efforts have focused on the elucidation of early cellularevents surrounding transformation in breast tissue. Such efforts haveled to the identification of several potential breast cancer markers.For example, alleles of the BRCA1 and BRCA2 genes have been linked tohereditary and early-onset breast cancer (Wooster, et al., Science, 265:2088-2090 (1994)). However, BRCA1 is limited as a cancer marker becauseBRCA1 mutations fail to account for the majority of breast cancers(Ford, et al., British J. Cancer, 72: 805-812 (1995)). Similarly, theBRCA2 gene, which has been linked to forms of hereditary breast cancer,accounts for only a small portion of total breast cancer cases.

SUMMARY

It has been discovered that certain polymorphic variations in humangenomic DNA are associated with the occurrence of breast cancer. Inparticular, polymorphic variants in loci containing ICAM, MAPK10,KIAA0861, NUMA1/FLJ20625/LOC220074 (hereafter referred to as “NUMA1”),and HT014/LOC148902/LYPLA2/GALE (hereafter referred to as “GALE”)regions in human genomic DNA have been associated with risk of breastcancer.

Thus, featured herein are methods for identifying a subject at risk ofbreast cancer and/or a risk of breast cancer in a subject, whichcomprises detecting the presence or absence of one or more polymorphicvariations accociated with breast cancer in genomic regions describedherein in a human nucleic acid sample. In an embodiment, two or morepolymorphic variations are detected in two or more regions selected fromthe group consisting of ICAM, MAPK10, KIAA0861, NUMA1 and GALE. Incertain embodiments, 3 or more, or 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19 or 20 or more polymorphic variants are detected. Inspecific embodiments, the group of polymorphic variants detectedcomprise or consist of polymorphic variants in ICAM, MAPK10, KIAA0861,NUMA1 and GALE, such as position 44247 in SEQ ID NO: 1 (ICAM), position36424 in SEQ ID NO: 2 (MAPK10), position 48563 in SEQ ID NO: 3(KIAA0861), position 49002 in SEQ ID NO: 4 (NUMA1) and position 174 inSEQ ID NO: 5 (GALE), for example.

Also featured are nucleic acids that include one or more polymorphicvariations associated with the occurrence of breast cancer, as well aspolypeptides encoded by these nucleic acids. Further, provided is amethod for identifying a subject at risk of breast cancer and thenprescribing to the subject a breast cancer detection procedure,prevention procedure and/or a treatment procedure. In addition, providedare methods for identifying candidate therapeutic molecules for treatingbreast cancer and related disorders, as well as methods for treatingbreast cancer in a subject by diagnosing breast cancer in the subjectand treating the subject with a suitable treatment, such asadministering a therapeutic molecule.

Also provided are compositions comprising a breast cancer cell and/orICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acid with a RNAi, siRNA,antisense DNA or RNA, or ribozyme nucleic acid designed from a ICAM,MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequence. In an embodiment,the nucleic acid is designed from a ICAM, MAPK10, KIAA0861, NUMA1 orGALE nucleotide sequence that includes one or more breast cancerassociated polymorphic variations, and in some instances, specificallyinteracts with such a nucleotide sequence. Further, provided are arraysof nucleic acids bound to a solid surface, in which one or more nucleicacid molecules of the array have a ICAM, MAPK10, KIAA0861, NUMA1 or GALEnucleotide sequence, or a fragment or substantially identical nucleicacid thereof, or a complementary nucleic acid of the foregoing. Featuredalso are compositions comprising a breast cancer cell and/or a ICAM,MAPK10, KIAA0861, NUMA1 or GALE polypeptide, with an antibody thatspecifically binds to the polypeptide. In an embodiment, the antibodyspecifically binds to an epitope in the polypeptide that includes anon-synonymous amino acid modification associated with breast cancer(e.g., results in an amino acid substitution in the encoded polypeptideassociated with breast cancer). In certain embodiments, the antibodyspecifically binds to an epitope that comprises a proline at amino acidposition 352 or an alanine at amino acid position 348 in an ICAM5polypeptide.

BRIEF DESCRIPTION OF THE FIGURES

FIGS. 1A-1Y show a genomic nucleotide sequence for an ICAM regionencoding ICAM 1, 4 and 5. The genomic nucleotide sequence is set forthin SEQ ID NO: 1. The following nucleotide representations are usedthroughout: “A” or “a” is adenosine, adenine, or adenylic acid; “C” or“c” is cytidine, cytosine, or cytidylic acid; “G” or “g” is guanosine,guanine, or guanylic acid; “T” or “t” is thymidine, thymine, orthymidylic acid; and “I” or “i” is inosine, hypoxanthine, or inosinicacid. Exons are indicated in italicized lower case type, introns aredepicted in normal text lower case type, and polymorphic sites aredepicted in bold upper case type. SNPs are designated by the followingconvention: “R” represents A or G, “M” represents A or C; “W” representsA or T; “Y” represents C or T; “S” represents C or G; “K” represents Gor T; “V” represents A, C or G; “H” represents A, C, or T; “D”represents A, G, or T; “B” represents C, G, or T; and “N” represents A,G, C, or T.

FIGS. 2A-2U show a genomic nucleotide sequence of a MAPK10 region. Thegenomic nucleotide sequence is set forth in SEQ ID NO: 2.

FIGS. 3A-3NN show a genomic nucleotide sequence of a KIAA0861 region.The genomic nucleotide sequence is set forth in SEQ ID NO: 3.

FIGS. 4A-4JJ show a genomic nucleotide sequence of aNUMA1/FLJ20625/LOC220074 region, referred to herein as the NUMA1 region.The genomic nucleotide sequence is set forth in SEQ ID NO: 4.

FIG. 5 shows a portion of a genomic nucleotide sequence of aHT014/LOC148902/LYPLA2/GALE region, referred to herein as the GALEregion. The genomic nucleotide sequence is set forth in SEQ ID NO: 5.

FIGS. 6A-6C show coding nucleotide sequences (cDNA) for ICAM1, ICAM4 andICAM5, respectively. The nucleotide sequences are set forth in SEQ IDNOs: 6, 7 and 8, respectively.

FIG. 7 shows a coding nucleotide sequence (cDNA) for MAPK10. Thenucleotide sequence is set forth in SEQ ID NO: 9.

FIGS. 8A-8B show coding nucleotide sequences (cDNA) for KIAA0861. Thenucleotide sequences are set forth in SEQ ID NO: 10 and 11,respectively.

FIGS. 9A-9B show a coding nucleotide sequence (cDNA) for NUMA1. Thenucleotide sequence is set forth in SEQ ID NO: 12.

FIGS. 10A-10C show amino acid sequences for ICAM1, ICAM4 and ICAM5polypeptides. The amino acid sequences are set forth in SEQ ID NOs: 13,14 and 15, respectively.

FIG. 11 shows an amino acid sequence for a MAPK10 polypeptide, which isset forth in SEQ ID NO: 16.

FIG. 12 shows an amino acid sequence for a KIAA0861 polypeptide, whichis set forth in SEQ ID NO: 17.

FIG. 13 shows an amino acid sequence for a NUMA1 polypeptide, which isset forth in SEQ ID NO: 18.

FIG. 14 shows proximal SNPs in the ICAM region in genomic DNA. Theposition of each SNP on the chromosome is shown on the x-axis and they-axis provides the negative logarithm of the p-value comparing theestimated allele to that of the control group. Also shown in the figureare exons and introns of the genes in the approximate chromosomalpositions. The figure indicates that polymorphic variants associatedwith breast cancer are in linkage disequilibrium in a region spanningpositions 11851-24282, 36340-37868, 41213-41613, 70875-74228,42407-45536, or 42407-51102 in SEQ ID NO: 1.

FIG. 15 shows proximal SNPs in the MAPK10 region in genomic DNA. Theposition of each SNP on the chromosome is shown on the x-axis and they-axis provides the negative logarithm of the p-value comparing theestimated allele to that of the control group. Also shown in the figureare exons and introns of the genes in the approximate chromosomalpositions. The figure indicates that polymorphic variants associatedwith breast cancer are in linkage disequilibrium in a region spanningpositions 23826-36424, 46176-62572, 4512-8467 or 13787-14355 in SEQ IDNO: 2.

FIG. 16 shows proximal SNPs in the KIAA0861 region in genomic DNA. Theposition of each SNP on the chromosome is shown on the x-axis and they-axis provides the negative logarithm of the p-value comparing theestimated allele to that of the control group. Also shown in the figureare exons and introns of the genes in the approximate chromosomalpositions. The figure indicates that polymorphic variants associatedwith breast cancer are in linkage disequilibrium in a region spanningpositions 42164-48563 in SEQ ID NO: 3.

FIG. 17 shows proximal SNPs in the KIAA0861 region in genomic DNA. Theposition of each SNP on the chromosome is shown on the x-axis and they-axis provides the negative logarithm of the p-value comparing theestimated allele to that of the control group. Also shown in the figureare exons and introns of the genes in the approximate chromosomalpositions. The figure indicates that polymorphic variants associatedwith breast cancer are in linkage disequilibrium in a region spanningpositions 174-32954, 38115-43785, 45386-52058, 52257-54411, 55303-73803or 96470-98184 in SEQ ID NO: 4.

FIG. 18 shows results of an odds-ratio meta analysis for the ICAMregion.

FIG. 19 shows results of an odds-ratio meta analysis for the MAPK10region.

FIG. 20 shows results of an odds-ratio meta analysis for the KIAA0861region.

FIG. 21 shows results of an odds-ratio meta analysis for the NUMA1region.

FIG. 22 shows effects of ICAM-directed siRNA on cancer cellproliferation.

DETAILED DESCRIPTION

It has been discovered that polymorphic variations in the ICAM, MAPK10,KIAA0861, NUMA1 and GALE regions described herein are associated with anincreased risk of breast cancer.

All ICAM proteins are type I transmembrane glycoproteins, contain 2-9immunoglobulin-like C2-type domains, and bind to the leukocyte adhesionLFA-1 protein. The proteins are members of the intercellular adhesionmolecule (ICAM) family. The gene ICAM1 (intercellular adhesionmolecule-1) is also known as human rhinovirus receptor, BB2, CD54. andcell surface glycoprotein P3.58. ICAM1 has been mapped to chromosomalposition 19p13.3-p13.2. ICAM1 (CD54) typically is expressed onendothelial cells and cells of the immune system. ICAM1 binds tointegrins of type CD11a/CD18, or CD11b/CD18. ICAM1 is also exploited byRhinovirus as a receptor.

The gene ICAM4 (intercellular adhesion molecule 4) is also known as theLandsteiner-Wiener blood group or LW. ICAM4 has been mapped to19p13.2-cen. The protein encoded by this gene is a member of theintercellular adhesion molecule (ICAM) family. A glutamine to argininepolymorphism in this protein is responsible for the Landsteiner-Wienerblood group system (GLN=WB(A); ARG=WB(B). This gene consists of 3 exonsand alternative splicing generates 2 transcript variants.

The gene ICAM5 (intercellular adhesion molecule 5) is also known astelencephalin. ICAM5 has been mapped to 19p13.2. The protein encoded bythe gene is expressed on the surface of telencephalic neurons anddisplays two types of adhesion activity, homophilic binding betweenneurons and heterophilic binding between neurons and leukocytes. It maybe a critical component in neuron-microglial cell interactions in thecourse of normal development or as part of neurodegenerative diseases.

The gene MAPK10 also is known as JNK3, JNK3A, PRKM10, p493F12, FLJ12099,p54bSAPK MAP kinase, c-Jun kinase 3, JNK3 alpha protein kinase, c-JunN-terminal kinase 3, stress activated protein kinase JNK3, stressactivated protein kinase beta. MAPK10 has been mapped to chromosomalposition 4q22.1-q23. The protein encoded by this gene is a member of theMAP kinase family. MAP kinases act as an integration point for multiplebiochemical signals, and are involved in a wide variety of cellularprocesses such as proliferation, differentiation, transcriptionregulation and development. This protein is a neuronal-specific form ofc-Jun N-terminal kinases (JNKs). Through its phosphorylation and nuclearlocalization, this kinase plays regulatory roles in the signalingpathways during neuronal apoptosis. Beta-arrestin 2, areceptor-regulated MAP kinase scaffold protein, is found to interactwith, and stimulate the phosphorylation of this kinase by MAP kinasekinase 4 (MKK4). Cyclin-dependent kinase 5 can phosphorylate, andinhibit the activity of this kinase, which may be important inpreventing neuronal apoptosis. Four alternatively spliced transcriptvariants encoding distinct isoforms have been reported.

The gene KIAA0861 is a Rho family guanine-nucleotide exchange factor.KIAA0861 has been mapped to chromosomal position 3q27.3. KIAA0861 is aRho family nucleotide exchange factor homolog that modulates theactivity of Rho family GTPases, which control numerous cell functions,including cell growth, adhesion, movement and shape. RhoC GTPase isoverexpressed in invasive (inflammatory) breast cancers.

The gene FLJ20625 has been mapped to chromosomal position 11q13.3. Thegene encoding LOC220074 also is known as Hypothetical 55.1 kDa proteinF09G8.5 in chromosome III and has been mapped to chromosomal position11q13.3.

The gene HT014 has been mapped to chromosomal position 1p36.11. The geneLYPLA2 (lysophospholipase II) also is known as APT-2, DJ886K2.4 andacyl-protein thioesterase and has been mapped to chromosomal position1p36.12-p35.1. Lysophospholipases are enzymes that act on biologicalmembranes to regulate the multifunctional lysophospholipids. There arealternatively spliced transcript variants described for this gene butthe full length nature is not known yet.

The gene GALE (galactose-4-epimerase, UDP-) also is known asgalactowaldenase UDP galactose-4-epimerase and has been mapped tochromosomal position 1p36-p35. This gene encodesUDP-galactose-4-epimerase which catalyzes 2 distinct but analogousreactions: the epimerization of UDP-glucose to UDP-galactose, and theepimerization of UDP-N-acetylglucosamine to UDP-N-acetylgalactosamine.The bifunctional nature of the enzyme has the important metabolicconsequence that mutant cells (or individuals) are dependent not only onexogenous galactose, but also on exogenous N-acetylgalactosamine fornecessary precursor for the synthesis of glycoproteins and glycolipids.The missense mutations in the GALE gene result in theepimerase-deficiency galactosemia.

Breast Cancer and Sample Selection

Breast cancer is typically described as the uncontrolled growth ofmalignant breast tissue. Breast cancers arise most commonly in thelining of the milk ducts of the breast (ductal carcinoma), or in thelobules where breast milk is produced (lobular carcinoma). Other formsof breast cancer include Inflammatory Breast Cancer and Recurrent BreastCancer. Inflammatory breast cancer is a rare, but very serious,aggressive type of breast cancer. The breast may look red and feel warmwith ridges, welts, or hives on the breast; or the skin may lookwrinkled. It is sometimes misdiagnosed as a simple infection. Recurrentdisease means that the cancer has come back after it has been treated.It may come back in the breast, in the soft tissues of the chest (thechest wall), or in another part of the body.

As used herein, the term “breast cancer” refers to a conditioncharacterized by anomalous rapid proliferation of abnormal cells in oneor both breasts of a subject. The abnormal cells often are referred toas “neoplastic cells,” which are transformed cells that can form a solidtumor. The term “tumor” refers to an abnormal mass or population ofcells (i.e. two or more cells) that result from excessive or abnormalcell division, whether malignant or benign, and pre-cancerous andcancerous cells. Malignant tumors are distinguished from benign growthsor tumors in that, in addition to uncontrolled cellular proliferation,they can invade surrounding tissues and can metastasize. In breastcancer, neoplastic cells may be identified in one or both breasts onlyand not in another tissue or organ, in one or both breasts and one ormore adjacent tissues or organs (e.g. lymph node), or in a breast andone or more non-adjacent tissues or organs to which the breast cancercells have metastasized.

The term “invasion” as used herein refers to the spread of cancerouscells to adjacent surrounding tissues. The term “invasion” often is usedsynonymously with the term “metastasis,” which as used herein refers toa process in which cancer cells travel from one organ or tissue toanother non-adjacent organ or tissue. Cancer cells in the breast(s) canspread to tissues and organs of a subject, and conversely, cancer cellsfrom other organs or tissue can invade or metastasize to a breast.Cancerous cells from the breast(s) may invade or metastasize to anyother organ or tissue of the body. Breast cancer cells often invadelymph node cells and/or metastasize to the liver, brain and/or bone andspread cancer in these tissues and organs. Breast cancers can spread toother organs and tissues and cause lung cancer, prostate cancer, coloncancer, ovarian cancer, cervical cancer, gastrointestinal cancer,pancreatic cancer, glioblastoma, bladder cancer, hepatoma, colorectalcancer, uterine cervical cancer, endometrial carcinoma, salivary glandcarcinoma, kidney cancer, vulval cancer, thyroid cancer, hepaticcarcinoma, skin cancer, melanoma, ovarian cancer, neuroblastoma,myeloma, various types of head and neck cancer, acute lymphoblasticleukemia, acute myeloid leukemia, Ewing sarcoma and peripheralneuroepithelioma, and other carcinomas, lymphomas, blastomas, sarcomas,and leukemias.

Breast cancers arise most commonly in the lining of the milk ducts ofthe breast (ductal carcinoma), or in the lobules where breast milk isproduced (lobular carcinoma). Other forms of breast cancer includeInflammatory Breast Cancer and Recurrent Breast Cancer. InflammatoryBreast Cancer is a rare, but very serious, aggressive type of breastcancer. The breast may look red and feel warm with ridges, welts, orhives on the breast; or the skin may look wrinkled. It is sometimesmisdiagnosed as a simple infection. Recurrent disease means that thecancer has come back after it has been treated. It may come back in thebreast, in the soft tissues of the chest (the chest wall), or in anotherpart of the body. As used herein, the term “breast cancer” may includeboth Inflammatory Breast Cancer and Recurrent Breast Cancer.

In an effort to detect breast cancer as early as possible, regularphysical exams and screening mammograms often are prescribed andconducted. A diagnostic mammogram often is performed to evaluate abreast complaint or abnormality detected by physical exam or routinescreening mammography. If an abnormality seen with diagnosticmammography is suspicious, additional breast imaging (with exams such asultrasound) or a biopsy may be ordered. A biopsy followed bypathological (microscopic) analysis is a definitive way to determinewhether a subject has breast cancer. Excised breast cancer samples oftenare subjected to the following analyses: diagnosis of the breast tumorand confirmation of its malignancy; maximum tumor thickness; assessmentof completeness of excision of invasive and in situ components andmicroscopic measurements of the shortest extent of clearance; level ofinvasion; presence and extent of regression; presence and extent ofulceration; histological type and special variants; pre-existing lesion;mitotic rate; vascular invasion; neurotropism; cell type; tumorlymphocyte infiltration; and growth phase.

The stage of a breast cancer can be classified as a range of stages fromStage 0 to Stage IV based on its size and the extent to which it hasspread. The following table summarizes the stages:

TABLE A Metastasis Stage Tumor Size Lymph Node Involvement (Spread) ILess than 2 cm No No II Between 2-5 cm No or in same side of No breastIII More than 5 cm Yes, on same side of No breast IV Not applicable Notapplicable Yes

Stage 0 cancer is a contained cancer that has not spread beyond thebreast ductal system. Fifteen to twenty percent of breast cancersdetected by clinical examinations or testing are in Stage 0 (theearliest form of breast cancer). Two types of Stage 0 cancer are lobularcarcinoma in situ (LCIS) and ductal carcinoma in situ (DCIS). LCISindicates high risk for breast cancer. Many physicians do not classifyLCIS as a malignancy and often encounter LCIS by chance on breast biopsywhile investigating another area of concern. While the microscopicfeatures of LCIS are abnormal and are similar to malignancy, LCIS doesnot behave as a cancer (and therefore is not treated as a cancer). LCISis merely a marker for a significantly increased risk of cancer anywherein the breast. However, bilateral simple mastectomy may be occasionallyperformed if LCIS patients have a strong family history of breastcancer. In DCIS the cancer cells are confined to milk ducts in thebreast and have not spread into the fatty breast tissue or to any otherpart of the body (such as the lymph nodes). DCIS may be detected onmammogram as tiny specks of calcium (known as microcalcifications) 80%of the time. Less commonly DCIS can present itself as a mass withcalcifications (15% of the time); and even less likely as a mass withoutcalcifications (<5% of the time). A breast biopsy is used to confirmDCIS. A standard DCIS treatment is breast-conserving therapy (BCT),which is lumpectomy followed by radiation treatment or mastectomy. Todate, DCIS patients have chosen equally among lumpectomy and mastectomyas their treatment option, though specific cases may sometimes favorlumpectomy over mastectomy or vice versa.

In Stage I, the primary (original) cancer is 2 cm or less in diameterand has not spread to the lymph nodes. In Stage IIA, the primary tumoris between 2 and 5 cm in diameter and has not spread to the lymph nodes.In Stage IIB, the primary tumor is between 2 and 5 cm in diameter andhas spread to the axillary (underarm) lymph nodes; or the primary tumoris over 5 cm and has not spread to the lymph nodes. In Stage IIIA, theprimary breast cancer of any kind that has spread to the axillary(underarm) lymph nodes and to axillary tissues. In Stage IIIB, theprimary breast cancer is any size, has attached itself to the chestwall, and has spread to the pectoral (chest) lymph nodes. In Stage IV,the primary cancer has spread out of the breast to other parts of thebody (such as bone, lung, liver, brain). The treatment of Stage IVbreast cancer focuses on extending survival time and relieving symptoms.

Based in part upon selection criteria set forth above, individualshaving breast cancer can be selected for genetic studies. Also,individuals having no history of cancer or breast cancer often areselected for genetic studies. Other selection criteria can include: atissue or fluid sample is derived from an individual characterized asCaucasian; the sample was derived from an individual of German paternaland maternal descent; the database included relevant phenotypeinformation for the individual; case samples were derived fromindividuals diagnosed with breast cancer; control samples were derivedfrom individuals free of cancer and no family history of breast cancer;and sufficient genomic DNA was extracted from each blood sample for allallelotyping and genotyping reactions performed during the study.Phenotype information included pre- or post-menopausal, familialpredisposition, country or origin of mother and father, diagnosis withbreast cancer (date of primary diagnosis, age of individual as ofprimary diagnosis, grade or stage of development, occurrence ofmetastases, e.g., lymph node metastases, organ metastases), condition ofbody tissue (skin tissue, breast tissue, ovary tissue, peritoneum tissueand myometrium), method of treatment (surgery, chemotherapy, hormonetherapy, radiation therapy).

Provided herein is a set of blood samples and a set of correspondingnucleic acid samples isolated from the blood samples, where the bloodsamples are donated from individuals diagnosed with breast cancer. Thesample set often includes blood samples or nucleic acid samples from 100or more, 150 or more, or 200 or more individuals having breast cancer,and sometimes from 250 or more, 300 or more, 400 or more, or 500 or moreindividuals. The individuals can have parents from any place of origin,and in an embodiment, the set of samples are extracted from individualsof German paternal and German maternal ancestry. The samples in each setmay be selected based upon five or more criteria and/or phenotypes setforth above.

Polymorphic Variants Associated with Breast Cancer

A genetic analysis provided herein linked breast cancer with polymorphicvariants in the ICAM, MAPK10, KIAA0861, NUMA1 and GALE regions of thehuman genome disclosed herein. As used herein, the term “polymorphicsite” refers to a region in a nucleic acid at which two or morealternative nucleotide sequences are observed in a significant number ofnucleic acid samples from a population of individuals. A polymorphicsite may be a nucleotide sequence of two or more nucleotides, aninserted nucleotide or nucleotide sequence, a deleted nucleotide ornucleotide sequence, or a microsatellite, for example. A polymorphicsite that is two or more nucleotides in length may be 3, 4, 5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15 or more, 20 or more, 30 or more, 50 or more,75 or more, 100 or more, 500 or more, or about 1000 nucleotides inlength, where all or some of the nucleotide sequences differ within theregion. A polymorphic site is often one nucleotide in length, which isreferred to herein as a “single nucleotide polymorphism” or a “SNP.”

Where there are two, three, or four alternative nucleotide sequences ata polymorphic site, each nucleotide sequence is referred to as a“polymorphic variant” or “nucleic acid variant.” Where two polymorphicvariants exist, for example, the polymorphic variant represented in aminority of samples from a population is sometimes referred to as a“minor allele” and the polymorphic variant that is more prevalentlyrepresented is sometimes referred to as a “major allele.” Many organismspossess a copy of each chromosome (e.g., humans), and those individualswho possess two major alleles or two minor alleles are often referred toas being “homozygous” with respect to the polymorphism, and thoseindividuals who possess one major allele and one minor allele arenormally referred to as being “heterozygous” with respect to thepolymorphism. Individuals who are homozygous with respect to one alleleare sometimes predisposed to a different phenotype as compared toindividuals who are heterozygous or homozygous with respect to anotherallele.

Furthermore, a genotype or polymorphic variant may be expressed in termsof a “haplotype,” which as used herein refers to two or more polymorphicvariants occurring within genomic DNA in a group of individuals within apopulation. For example, two SNPs may exist within a gene where each SNPposition includes a cytosine variation and an adenine variation. Certainindividuals in a population may carry one allele (heterozygous) or twoalleles (homozygous) having the gene with a cytosine at each SNPposition. As the two cytosines corresponding to each SNP in the genetravel together on one or both alleles in these individuals, theindividuals can be characterized as having a cytosine/cytosine haplotypewith respect to the two SNPs in the gene.

As used herein, the term “phenotype” refers to a trait which can becompared between individuals, such as presence or absence of acondition, a visually observable difference in appearance betweenindividuals, metabolic variations, physiological variations, variationsin the function of biological molecules, and the like. An example of aphenotype is occurrence of breast cancer.

Researchers sometimes report a polymorphic variant in a database withoutdetermining whether the variant is represented in a significant fractionof a population. Because a subset of these reported polymorphic variantsare not represented in a statistically significant portion of thepopulation, some of them are sequencing errors and/or not biologicallyrelevant. Thus, it is often not known whether a reported polymorphicvariant is statistically significant or biologically relevant until thepresence of the variant is detected in a population of individuals andthe frequency of the variant is determined. Methods for detecting apolymorphic variant in a population are described herein, specificallyin Example 2. A polymorphic variant is statistically significant andoften biologically relevant if it is represented in 5% or more of apopulation, sometimes 10% or more, 15% or more, or 20% or more of apopulation, and often 25% or more, 30% or more, 35% or more, 40% ormore, 45% or more, or 50% or more of a population.

A polymorphic variant may be detected on either or both strands of adouble-stranded nucleic acid. For example, a thymine at a particularposition in SEQ ID NO: 1 can be reported as an adenine from thecomplementary strand. Also, a polymorphic variant may be located withinan intron or exon of a gene or within a portion of a regulatory regionsuch as a promoter, a 5′ untranslated region (UTR), a 3′ UTR, and in DNA(e.g., genomic DNA (gDNA) and complementary DNA (cDNA)), RNA (e.g.,mRNA, tRNA, and rRNA), or a polypeptide. Polymorphic variations may ormay not result in detectable differences in gene expression, polypeptidestructure, or polypeptide function.

In the genetic analysis that associated breast cancer with thepolymorphic variants described hereafter, samples from individualshaving breast cancer and individuals not having cancer were allelotypedand genotyped. The term “genotyped” as used herein refers to a processfor determining a genotype of one or more individuals, where a“genotype” is a representation of one or more polymorphic variants in apopulation. Genotypes may be expressed in terms of a “haplotype,” whichas used herein refers to two or more polymorphic variants occurringwithin genomic DNA in a group of individuals within a population. Forexample, two SNPs may exist within a gene where each SNP positionincludes a cytosine variation and an adenine variation. Certainindividuals in a population may carry one allele (heterozygous) or twoalleles (homozygous) having the gene with a cytosine at each SNPposition. As the two cytosines corresponding to each SNP in the genetravel together on one or both alleles in these individuals, theindividuals can be characterized as having a cytosine/cytosine haplotypewith respect to the two SNPs in the gene.

It was determined that polymorphic variations associated with anincreased risk of breast cancer existed in ICAM, MAPK10, KIAA0861, NUMA1or GALE nucleotide sequences. Polymorphic variants in and around theICAM, MAPK10, KIAA0861, NUMA1 and GALE loci were tested for associationwith breast cancer. In the ICAM locus, these included polymorphicvariants at positions in SEQ ID NO: 1 selected from the group consistingof 139, 11799, 11851, 11851, 11963, 24282, 26849, 29633, 31254, 31967,32920, 33929, 35599, 36101, 36101, 36340, 36405, 36517, 36777, 36992,37645, 37868, 38440, 38440, 38532, 38532, 38547, 38547, 38712, 40684,40860, 41213, 41419, 41613, 42407, 43440, 43440, 44247, 44247, 44247,44247, 44677, 44677, 45256, 45256, 45536, 45536, 46153, 47546, 47697,47944, 47944, 48530, 51102, 57090, 60093, 60439, 62694, 66260, 67295,67295, 67304, 67731, 67731, 68555, 68555, 70429, 70875, 72360, 74228,76802, 77664, 78803, 79263, 80810, 81020, 82426, 82783, 85912, 85912,86135, 86135, 87877, 87877, 88043, 88043, 88206, 88343, 90701, 90701,90974, 91060, 91087, 91594, 91594, 92302, 92384, 36517, and 44677.Polymorphic variants in a region spanning positions 11851-24282,36340-37868, 41213-41613, 70875-74228, 42407-45536, and 42407-51102 inSEQ ID NO: 1 in particular were associated with an increased risk ofbreast cancer, including polymorphic variants at positions 11963, 36340,36992, 37868, 41213, 41419, 41613, 42407, 44247, 44677, 45256, 45536,51102, 72360, 36517, and 44677 in SEQ ID NO: 1. At these positions inSEQ ID NO: 1, an adenine at position 11963, a guanine at position 36340,an adenine at position 36992, a guanine at position 37868, a cytosine atposition 41213, a guanine at position 41419, a guanine at position41613, a cytosine at position 42407, a cytosine at position 44247, anadenine or cytosine at position 44677, a thymine at position 45256, aguanine at position 45536, a cytosine at position 51102, a guanine atposition 72360, a cytosine at position 36517, and guanine at position44677, in particular were associated with risk of breast cancer. Also, aproline at amino acid position 352 or an alanine at amino acid position348 in SEQ ID NO: 15 were in particular associated with an increasedrisk of breast cancer.

In the MAPK10 locus, these included polymorphic variants at positions inSEQ ID NO: 2 selected from the group consisting of 191, 1490, 3781,3935, 4512, 7573, 8467, 9001, 9732, 13477, 13787, 13903, 14355, 15053,15459, 17762, 19482, 19631, 22170, 22688, 22748, 23376, 23826, 23868,24154, 25972, 26057, 26361, 26599, 26712, 26812, 27069, 32421, 33557,35127, 35222, 35999, 36424, 37403, 39203, 39226, 41147, 46176, 50452,52919, 60214, 61093, 62572, 63601, 65362, 65863, 66207, 66339, 69512,70759, 71217, 73382, and 76307. Polymorphic variants in a regionspanning positions 23826-36424, 46176-62572, 4512-8467 or 13787-14355 inSEQ ID NO: 2 in particular were associated with an increased risk ofbreast cancer, including polymorphic variants at positions 7573, 13903,23826, 26057, 26361, 26599, 26812, 27069, 35127, 35222, 36424, 46176,50452, 61093, 62572, and 70759 in SEQ ID NO: 2. At these positions inSEQ ID NO: 2, a guanine at position 7573, a guanine at position 13903,an adenine at position 23826, an adenine at position 26057, a thymine atposition 26361, an adenine at position 26599, an adenine at position26812, a cytosine at position 27069, an adenine at position 35127, athymine at position 35222, a cytosine at position 36424, a cytosine atposition 46176, a cytosine at position 50452, a guanine at position61093, an adenine at position 62572, and a guanine at position 70759, inparticular were associated with risk of breast cancer.

In the KIAA0861 locus, these included polymorphic variants at positionsin SEQ ID NO: 3 selected from the group consisting of 107, 2157, 7300,8233, 9647, 9868, 9889, 10621, 11003, 11507, 11527, 11718, 11808, 12024,13963, 14300, 14361, 16287, 18635, 19365, 24953, 25435, 26847, 27492,27620, 27678, 27714, 29719, 30234, 31909, 32153, 33572, 42164, 43925,45031, 45655, 48350, 48418, 48563, 53189, 56468, 59358, 63761, 65931,67040, 69491, 83308, 126545, 137592, and 147169. Polymorphic variants ina region spanning positions 42164-48563 in SEQ ID NO: 3 in particularwere associated with an increased risk of breast cancer, includingpolymorphic variants at positions 107, 42164, 45031, 45655, 48563, 19365and 14361 in SEQ ID NO: 3. At these positions in SEQ ID NO: 3, anadenine at position 107, a thymine at position 14361, a guanine atposition 19365, a thymine at position 42164, a cytosine at position45031, a thymine at position 45655 and a cytosine at position 48563, inparticular were associated with risk of breast cancer. Also, leucine atamino acid position 359 in SEQ ID NO: 17, a leucine at amino acidposition 378 in SEQ ID NO: 17, or an alanine at amino acid position 857in SEQ ID NO: 17 were in particular associated with an increased risk ofbreast cancer.

In the NUMA1 locus, these included polymorphic variants at positions inSEQ ID NO: 4 selected from the group consisting of 174, 815, 3480, 9715,14755, 15912, 19834, 19850, 20171, 20500, 20536, 23187, 25289, 25470,28720, 29566, 30155, 30752, 32710, 32954, 33725, 33842, 36345, 38115,39150, 40840, 41969, 42045, 43785, 44444, 44579, 45386, 46827, 47320,47625, 47837, 47866, 49002, 49566, 52058, 52249, 52257, 52850, 53860,54052, 54411, 55098, 55303, 59398, 59533, 60542, 61541, 62309, 72299,73031, 73803, 80950, 82137, 96077, 96470, 98116, 98184, and 132952.Polymorphic variants in a region spanning positions 174-32954,38115-43785, 45386-52058, 52257-54411, 55303-73803 or 96470-98184 in SEQID NO: 4 in particular were associated with an increased risk of breastcancer, including polymorphic variants at positions 174, 815, 3480,19834, 19850, 20171, 20500, 20536, 23187, 25470, 30155, 30752, 32710,32954, 38115, 39150, 40840, 41969, 42045, 43785, 45386, 46827, 47320,47625, 47837, 47866, 49002, 49566, 52058, 52257, 52850, 53860, 54052,54411, 55303, 59398, 60542, 62309, 72299, 73031, 73803, and 98116 in SEQID NO: 4. At these positions in SEQ ID NO: 4, a thymine at position 174,an adenine at position 815, a cytosine at position 3480, a guanine atposition 19834, an adenine at position 19850, a thymine at position20171, a thymine at position 20500, a cytosine at position 20536, acytosine at position 23187, a thymine at position 25470, a thymine atposition 30155, a guanine at position 30752, a thymine at position32710, a guanine at position 32954, an adenine at position 38115, acytosine at position 39150, a thymine at position 40840, an adenine atposition 41969, a thymine at position 42045, a guanine at position43785, a cytosine at position 45386, an adenine at position 46827, anadenine at position 47320, a cytosine at position 47625, a cytosine atposition 47837, an adenine at position 47866, a cytosine at position49002, a thymine at position 49566, a cytosine at position 52058, athymine at position 52257, a thymine at position 52850, a cytosine atposition 53860, a cytosine at position 54052, a thymine at position54411, a cytosine at position 55303, an adenine at position 59398, anadenine at position 60542, an adenine at position 62309, a cytosine atposition 72299, a thymine at position 73031, a guanine at position73803, and a thymine at position 98116, in particular were associatedwith risk of breast cancer. In the GALE locus, a polymorphic variant atposition 174 in SEQ ID NO: 5 was in particular associated with increasedrisk of breast cancer, and an adenine this position was thecancer-associated allele.

Additional Polymorphic Variants Associated with Breast Cancer

Also provided is a method for identifying polymorphic variants proximalto an incident, founder polymorphic variant associated with breastcancer. Thus, featured herein are methods for identifying a polymorphicvariation associated with breast cancer that is proximal to an incidentpolymorphic variation associated with breast cancer, which comprisesidentifying a polymorphic variant proximal to the incident polymorphicvariant associated with breast cancer, where the incident polymorphicvariant is in a nucleotide sequence set forth in SEQ ID NO: 1-5. Thenucleotide sequence often comprises a polynucleotide sequence selectedfrom the group consisting of (a) a nucleotide sequence set forth in SEQID NO: 1-5; (b) a nucleotide sequence which encodes a polypeptide havingan amino acid sequence encoded by a nucleotide sequence in SEQ ID NO:1-5; (c) a nucleotide sequence which encodes a polypeptide that is 90%or more identical to an amino acid sequence encoded by a nucleotidesequence in SEQ ID NO: 1-5 or a nucleotide sequence about 90% or moreidentical to the nucleotide sequence set forth in SEQ ID NO: 1-5; and(d) a fragment of a nucleotide sequence of (a), (b), or (c), often afragment that includes a polymorphic site associated with breast cancer.The presence or absence of an association of the proximal polymorphicvariant with breast cancer then is determined using a known associationmethod, such as a method described in the Examples hereafter. In anembodiment, the incident polymorphic variant is described in SEQ ID NO:1-5. In another embodiment, the proximal polymorphic variant identifiedsometimes is a publicly disclosed polymorphic variant, which forexample, sometimes is published in a publicly available database. Inother embodiments, the polymorphic variant identified is not publiclydisclosed and is discovered using a known method, including, but notlimited to, sequencing a region surrounding the incident polymorphicvariant in a group of nucleic acid samples. Thus, multiple polymorphicvariants proximal to an incident polymorphic variant are associated withbreast cancer using this method.

The proximal polymorphic variant often is identified in a regionsurrounding the incident polymorphic variant. In certain embodiments,this surrounding region is about 50 kb flanking the first polymorphicvariant (e.g. about 50 kb 5′ of the first polymorphic variant and about50 kb 3′ of the first polymorphic variant), and the region sometimes iscomposed of shorter flanking sequences, such as flanking sequences ofabout 40 kb, about 30 kb, about 25 kb, about 20 kb, about 15 kb, about10 kb, about 7 kb, about 5 kb, or about 2 kb 5′ and 3′ of the incidentpolymorphic variant. In other embodiments, the region is composed oflonger flanking sequences, such as flanking sequences of about 55 kb,about 60 kb, about 65 kb, about 70 kb, about 75 kb, about 80 kb, about85 kb, about 90 kb, about 95 kb, or about 100 kb 5′ and 3′ of theincident polymorphic variant.

In certain embodiments, polymorphic variants associated with breastcancer are identified iteratively. For example, a first proximalpolymorphic variant is associated with breast cancer using the methodsdescribed above and then another polymorphic variant proximal to thefirst proximal polymorphic variant is identified (e.g., publiclydisclosed or discovered) and the presence or absence of an associationof one or more other polymorphic variants proximal to the first proximalpolymorphic variant with breast cancer is determined.

The methods described herein are useful for identifying or discoveringadditional polymorphic variants that may be used to further characterizea gene, region or loci associated with a condition, a disease (e.g.,breast cancer), or a disorder. For example, allelotyping or genotypingdata from the additional polymorphic variants may be used to identify afunctional mutation or a region of linkage disequilibrium.

In certain embodiments, polymorphic variants identified or discoveredwithin a region comprising the first polymorphic variant associated withbreast cancer are genotyped using the genetic methods and sampleselection techniques described herein, and it can be determined whetherthose polymorphic variants are in linkage disequilibrium with the firstpolymorphic variant. The size of the region in linkage disequilibriumwith the first polymorphic variant also can be assessed using thesegenotyping methods. Thus, provided herein are methods for determiningwhether a polymorphic variant is in linkage disequilibrium with a firstpolymorphic variant associated with breast cancer, and such informationcan be used in prognosis methods described herein.

Isolated ICAM, MAPK10, KIAA0861, NUMA1 or GALE Nucleic Acids

Featured herein are isolated ICAM, MAPK10, KIAA0861, NUMA1 or GALEnucleic acids, which include the nucleic acid having the nucleotidesequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 11, nucleic acidvariants, and substantially identical nucleic acids of the foregoing.Nucleotide sequences of the ICAM, MAPK10, KIAA0861, NUMA1 or GALEnucleic acids sometimes are referred to herein as “ICAM, MAPK10,KIAA0861, NUMA1 or GALE nucleotide sequences.” A “ICAM, MAPK10,KIAA0861, NUMA1 or GALE nucleic acid variant” refers to one allele thatmay have one or more different polymorphic variations as compared toanother allele in another subject or the same subject. A polymorphicvariation in the ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acidvariant may be represented on one or both strands in a double-strandednucleic acid or on one chromosomal complement (heterozygous) or bothchromosomal complements (homozygous).

As used herein, the term “nucleic acid” includes DNA molecules (e.g., acomplementary DNA (cDNA) and genomic DNA (gDNA)) and RNA molecules(e.g., mRNA, rRNA, and tRNA) and analogs of DNA or RNA, for example, byuse of nucleotide analogs. The nucleic acid molecule can besingle-stranded and it is often double-stranded. The term “isolated orpurified nucleic acid” refers to nucleic acids that are separated fromother nucleic acids present in the natural source of the nucleic acid.For example, with regard to genomic DNA, the term “isolated” includesnucleic acids which are separated from the chromosome with which thegenomic DNA is naturally associated. An “isolated” nucleic acid is oftenfree of sequences which naturally flank the nucleic acid (i.e.,sequences located at the 5′ and/or 3′ ends of the nucleic acid) in thegenomic DNA of the organism from which the nucleic acid is derived. Forexample, in various embodiments, the isolated nucleic acid molecule cancontain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kbof 5′ and/or 3′ nucleotide sequences which flank the nucleic acidmolecule in genomic DNA of the cell from which the nucleic acid isderived. Moreover, an “isolated” nucleic acid molecule, such as a cDNAmolecule, can be substantially free of other cellular material, orculture medium when produced by recombinant techniques, or substantiallyfree of chemical precursors or other chemicals when chemicallysynthesized. As used herein, the term “ICAM, MAPK10, KIAA0861, NUMA1 orGALE gene” refers to a nucleotide sequence that encodes a ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptide.

Also included herein are nucleic acid fragments. These fragmentstypically are a nucleotide sequence identical to a nucleotide sequencein SEQ ID NO: 1-12, a nucleotide sequence substantially identical to anucleotide sequence in SEQ ID NO: 1-12, or a nucleotide sequence that iscomplementary to the foregoing. The nucleic acid fragment may beidentical, substantially identical or homologous to a nucleotidesequence in an exon or an intron in SEQ ID NO: 1-5, and may encode adomain or part of a domain or motif of a ICAM, MAPK10, KIAA0861, NUMA1or GALE polypeptide. Sometimes, the fragment will comprises thepolymorphic variation described herein as being associated with breastcancer. The nucleic acid fragment sometimes is 50, 100, or 200 or fewerbase pairs in length, and is sometimes about 300, 400, 500, 600, 700,800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900,2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100,3200, 3300, 3400, 3500, 3600, 3800, 4000, 5000, 6000, 7000, 8000, 9000,10000, 15000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000,100000, 110000, 120000, 130000, 140000, 150000 or 160000 base pairs inlength. A nucleic acid fragment complementary to a nucleotide sequenceidentical or substantially identical to the nucleotide sequence of SEQID NO: 1-12 and hybridizes to such a nucleotide sequence under stringentconditions often is referred to as a “probe.” Nucleic acid fragmentsoften include one or more polymorphic sites, or sometimes have an endthat is adjacent to a polymorphic site as described hereafter.

An example of a nucleic acid fragment is an oligonucleotide. As usedherein, the term “oligonucleotide” refers to a nucleic acid comprisingabout 8 to about 50 covalently linked nucleotides, often comprising fromabout 8 to about 35 nucleotides, and more often from about 10 to about25 nucleotides. The backbone and nucleotides within an oligonucleotidemay be the same as those of naturally occurring nucleic acids, oranalogs or derivatives of naturally occurring nucleic acids, providedthat oligonucleotides having such analogs or derivatives retain theability to hybridize specifically to a nucleic acid comprising atargeted polymorphism. Oligonucleotides described herein may be used ashybridization probes or as components of prognostic or diagnosticassays, for example, as described herein.

Oligonucleotides are typically synthesized using standard methods andequipment, such as the ABI 3900 High Throughput DNA Synthesizer and theEXPEDITE™ 8909 Nucleic Acid Synthesizer, both of which are availablefrom Applied Biosystems (Foster City, Calf.). Analogs and derivativesare exemplified in U.S. Pat. Nos. 4,469,863; 5,536,821; 5,541,306;5,637,683; 5,637,684; 5,700,922; 5,717,083; 5,719,262; 5,739,308;5,773,601; 5,886,165; 5,929,226; 5,977,296; 6,140,482; WO 00/56746; WO01/14398, and related publications. Methods for synthesizingoligonucleotides comprising such analogs or derivatives are disclosed,for example, in the patent publications cited above and in U.S. Pat.Nos. 5,614,622; 5,739,314; 5,955,599; 5,962,674; 6,117,992; in WO00/75372; and in related publications.

Oligonucleotides also may be linked to a second moiety. The secondmoiety may be an additional nucleotide sequence such as a tail sequence(e.g., a polyadenosine tail), an adapter sequence (e.g., phage M13universal tail sequence), and others. Alternatively, the second moietymay be a non-nucleotide moiety such as a moiety which facilitateslinkage to a solid support or a label to facilitate detection of theoligonucleotide. Such labels include, without limitation, a radioactivelabel, a fluorescent label, a chemiluminescent label, a paramagneticlabel, and the like. The second moiety may be attached to any positionof the oligonucleotide, provided the oligonucleotide can hybridize tothe nucleic acid comprising the polymorphism.

Uses for Nucleic Acid Sequences

Nucleic acid coding sequences depicted in SEQ ID NO: 1-12 may be usedfor diagnostic purposes for detection and control of polypeptideexpression. Also, included herein are oligonucleotide sequences such asantisense RNA, small-interfering RNA (siRNA) and DNA molecules andribozymes that function to inhibit translation of a polypeptide.Antisense techniques and RNA interference techniques are known in theart and are described herein.

Ribozymes are enzymatic RNA molecules capable of catalyzing the specificcleavage of RNA. The mechanism of ribozyme action involves sequencespecific hybridization of the ribozyme molecule to complementary targetRNA, followed by a endonucleolytic cleavage. Ribozymes may be engineeredhammerhead motif ribozyme molecules that specifically and efficientlycatalyze endonucleolytic cleavage of RNA sequences corresponding to orcomplementary to the nucleotide sequences set forth in SEQ ID NO: 1-12.Specific ribozyme cleavage sites within any potential RNA target areinitially identified by scanning the target molecule for ribozymecleavage sites which include the following sequences, GUA, GUU and GUC.Once identified, short RNA sequences of between fifteen (15) and twenty(20) ribonucleotides corresponding to the region of the target genecontaining the cleavage site may be evaluated for predicted structuralfeatures such as secondary structure that may render the oligonucleotidesequence unsuitable. The suitability of candidate targets may also beevaluated by testing their accessibility to hybridization withcomplementary oligonucleotides, using ribonuclease protection assays.

Antisense RNA and DNA molecules, siRNA and ribozymes may be prepared byany method known in the art for the synthesis of RNA molecules. Theseinclude techniques for chemically synthesizing oligodeoxyribonucleotideswell known in the art such as solid phase phosphoramidite chemicalsynthesis. Alternatively, RNA molecules may be generated by in vitro andin vivo transcription of DNA sequences encoding the antisense RNAmolecule. Such DNA sequences may be incorporated into a wide variety ofvectors which incorporate suitable RNA polymerase promoters such as theT7 or SP6 polymerase promoters. Alternatively, antisense cDNA constructsthat synthesize antisense RNA constitutively or inducibly, depending onthe promoter used, can be introduced stably into cell lines.

DNA encoding a polypeptide also may have a number of uses for thediagnosis of diseases, including breast cancer, resulting from aberrantexpression of a target gene described herein. For example, the nucleicacid sequence may be used in hybridization assays of biopsies orautopsies to diagnose abnormalities of expression or function (e.g.,Southern or Northern blot analysis, in situ hybridization assays).

In addition, the expression of a polypeptide during embryonicdevelopment may also be determined using nucleic acid encoding thepolypeptide. As addressed, infra, production of functionally impairedpolypeptide can be the cause of various disease states, such as breastcancer. In situ hybridizations using polynucleotide probes may beemployed to predict problems related to breast cancer. Further, asindicated, infra, administration of human active polypeptide,recombinantly produced as described herein, may be used to treat diseasestates related to functionally impaired polypeptide. Alternatively, genetherapy approaches may be employed to remedy deficiencies of functionalpolypeptide or to replace or compete with dysfunctional polypeptide.

Expression Vectors, Host Cells, and Genetically Engineered Cells

Provided herein are nucleic acid vectors, often expression vectors,which contain a ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acid. Asused herein, the term “vector” refers to a nucleic acid molecule capableof transporting another nucleic acid to which it has been linked and caninclude a plasmid, cosmid, or viral vector. The vector can be capable ofautonomous replication or it can integrate into a host DNA. Viralvectors may include replication defective retroviruses, adenoviruses andadeno-associated viruses for example.

A vector can include a ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleicacid in a form suitable for expression of the nucleic acid in a hostcell. The recombinant expression vector typically includes one or moreregulatory sequences operatively linked to the nucleic acid sequence tobe expressed. The term “regulatory sequence” includes promoters,enhancers and other expression control elements (e.g., polyadenylationsignals). Regulatory sequences include those that direct constitutiveexpression of a nucleotide sequence, as well as tissue-specificregulatory and/or inducible sequences. The design of the expressionvector can depend on such factors as the choice of the host cell to betransformed, the level of expression of polypeptide desired, and thelike. Expression vectors can be introduced into host cells to produceICAA, MAPK10, KIAA0861, NUMA1 or GALE polypeptides, including fusionpolypeptides, encoded by ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleicacids.

Recombinant expression vectors can be designed for expression of ICAM,MAPK10, KIAA0861, NUMA1 or GALE polypeptides in prokaryotic oreukaryotic cells. For example, ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptides can be expressed in E. coli, insect cells (e.g., usingbaculovirus expression vectors), yeast cells, or mammalian cells.Suitable host cells are discussed further in Goeddel, Gene ExpressionTechnology: Methods in Enzymology 185, Academic Press, San Diego, Calif.(1990). Alternatively, the recombinant expression vector can betranscribed and translated in vitro, for example using T7 promoterregulatory sequences and T7 polymerase.

Expression of polypeptides in prokaryotes is most often carried out inE. coli with vectors containing constitutive or inducible promotersdirecting the expression of either fusion or non-fusion polypeptides.Fusion vectors add a number of amino acids to a polypeptide encodedtherein, usually to the amino terminus of the recombinant polypeptide.Such fusion vectors typically serve three purposes: 1) to increaseexpression of recombinant polypeptide; 2) to increase the solubility ofthe recombinant polypeptide; and 3) to aid in the purification of therecombinant polypeptide by acting as a ligand in affinity purification.Often, a proteolytic cleavage site is introduced at the junction of thefusion moiety and the recombinant polypeptide to enable separation ofthe recombinant polypeptide from the fusion moiety subsequent topurification of the fusion polypeptide. Such enzymes, and their cognaterecognition sequences, include Factor Xa, thrombin and enterokinase.Typical fusion expression vectors include pGEX (Pharmacia Biotech Inc;Smith & Johnson, Gene 67: 31-40 (1988)), pMAL (New England Biolabs,Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) which fuseglutathione S-transferase (GST), maltose E binding polypeptide, orpolypeptide A, respectively, to the target recombinant polypeptide.

Purified fusion polypeptides can be used in screening assays and togenerate antibodies specific for ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptides. In a therapeutic embodiment, fusion polypeptide expressedin a retroviral expression vector is used to infect bone marrow cellsthat are subsequently transplanted into irradiated recipients. Thepathology of the subject recipient is then examined after sufficienttime has passed (e.g., six (6) weeks).

Expressing the polypeptide in host bacteria with an impaired capacity toproteolytically cleave the recombinant polypeptide is often used tomaximize recombinant polypeptide expression (Gottesman, S., GeneExpression Technology: Methods in Enzymology, Academic Press, San Diego,Calif. 185: 119-128 (1990)). Another strategy is to alter the nucleotidesequence of the nucleic acid to be inserted into an expression vector sothat the individual codons for each amino acid are those preferentiallyutilized in E. coli (Wada et al., Nucleic Acids Res. 20: 2111-2118(1992)). Such alteration of nucleotide sequences can be carried out bystandard DNA synthesis techniques.

When used in mammalian cells, the expression vector's control functionsare often provided by viral regulatory elements. For example, commonlyused promoters are derived from polyoma, Adenovirus 2, cytomegalovirusand Simian Virus 40. Recombinant mammalian expression vectors are oftencapable of directing expression of the nucleic acid in a particular celltype (e.g., tissue-specific regulatory elements are used to express thenucleic acid). Non-limiting examples of suitable tissue-specificpromoters include an albumin promoter (liver-specific; Pinkert et al.,Genes Dev. 1: 268-277 (1987)), lymphoid-specific promoters (Calame &Eaton, Adv. Immunol. 43: 235-275 (1988)), promoters of T cell receptors(Winoto & Baltimore, EMBO J. 8: 729-733 (1989)) promoters ofimmunoglobulins (Banerji et al., Cell 33: 729-740 (1983); Queen &Baltimore, Cell 33: 741-748 (1983)), neuron-specific promoters (e.g.,the neurofilament promoter; Byrne & Ruddle, Proc. Natl. Acad. Sci. USA86: 5473-5477 (1989)), pancreas-specific promoters (Edlund et al.,Science 230: 912-916 (1985)), and mammary gland-specific promoters(e.g., milk whey promoter; U.S. Pat. No. 4,873,316 and EuropeanApplication Publication No. 264,166). Developmentally-regulatedpromoters are sometimes utilized, for example, the murine hox promoters(Kessel & Gruss, Science 249: 374-379 (1990)) and the a-fetopolypeptidepromoter (Campes & Tilghman, Genes Dev. 3: 537-546 (1989)).

A ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acid may also be clonedinto an expression vector in an antisense orientation. Regulatorysequences (e.g., viral promoters and/or enhancers) operatively linked toa ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acid cloned in theantisense orientation can be chosen for directing constitutive, tissuespecific or cell type specific expression of antisense RNA in a varietyof cell types. Antisense expression vectors can be in the form of arecombinant plasmid, phagemid or attenuated virus. For a discussion ofthe regulation of gene expression using antisense genes see Weintraub etal., Antisense RNA as a molecular tool for genetic analysis,Reviews—Trends in Genetics, Vol. 1(1) (1986).

Also provided herein are host cells that include a ICAM, MAPK10,KIAA0861, NUMA1 or GALE nucleic acid within a recombinant expressionvector or ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acid sequencefragments which allow it to homologously recombine into a specific siteof the host cell genome. The terms “host cell” and “recombinant hostcell” are used interchangeably herein. Such terms refer not only to theparticular subject cell but rather also to the progeny or potentialprogeny of such a cell. Because certain modifications may occur insucceeding generations due to either mutation or environmentalinfluences, such progeny may not, in fact, be identical to the parentcell, but are still included within the scope of the term as usedherein. A host cell can be any prokaryotic or eukaryotic cell. Forexample, a ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide can beexpressed in bacterial cells such as E. coli, insect cells, yeast ormammalian cells (such as Chinese hamster ovary cells (CHO) or COScells). Other suitable host cells are known to those skilled in the art.

Vectors can be introduced into host cells via conventionaltransformation or transfection techniques. As used herein, the terms“transformation” and “transfection” are intended to refer to a varietyof art-recognized techniques for introducing foreign nucleic acid (e.g.,DNA) into a host cell, including calcium phosphate or calcium chlorideco-precipitation, transduction/infection, DEAE-dextran-mediatedtransfection, lipofection, or electroporation.

A host cell provided herein can be used to produce (i.e., express) aICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide. Accordingly, furtherprovided are methods for producing a ICAM, MAPK10, KIAA0861, NUMA1 orGALE polypeptide using the host cells described herein. In oneembodiment, the method includes culturing host cells into which arecombinant expression vector encoding a ICAM, MAPK10, KIAA0861, NUMA1or GALE polypeptide has been introduced in a suitable medium such that aICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide is produced. Inanother embodiment, the method further includes isolating a ICAM,MAPK10, KIAA0861, NUMA1 or GALE polypeptide from the medium or the hostcell.

Also provided are cells or purified preparations of cells which includea ICAM, MAPK10, KIAA0861, NUMA1 or GALE transgene, or which otherwisemisexpress ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide. Cellpreparations can consist of human or non-human cells, e.g., rodentcells, e.g., mouse or rat cells, rabbit cells, or pig cells. In certainembodiments, the cell or cells include a ICAM, MAPK10, KIAA0861, NUMA1or GALE transgene (e.g., a heterologous form of a ICAM, MAPK10,KIAA0861, NUMA1 or GALE such as a human gene expressed in non-humancells). The ICAM, MAPK10, KIAA0861, NUMA1 or GALE transgene can bemisexpressed, e.g., overexpressed or underexpressed. In otherembodiments, the cell or cells include a gene which misexpress anendogenous ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide (e.g.,expression of a gene is disrupted, also known as a knockout). Such cellscan serve as a model for studying disorders which are related to mutatedor mis-expressed ICAM, MAPK10, KIAA0861, NUMA1 or GALE alleles or foruse in drug screening. Also provided are human cells (e.g., ahematopoietic stem cells) transformed with a ICAM, MAPK10, KIAA0861,NUMA1 or GALE nucleic acid.

Also provided are cells or a purified preparation thereof (e.g., humancells) in which an endogenous ICAM, MAPK10, KIAA0861, NUMA1 or GALEnucleic acid is under the control of a regulatory sequence that does notnormally control the expression of the endogenous ICAM, MAPK10,KIAA0861, NUMA1 or GALE gene. The expression characteristics of anendogenous gene within a cell (e.g., a cell line or microorganism) canbe modified by inserting a heterologous DNA regulatory element into thegenome of the cell such that the inserted regulatory element is operablylinked to the endogenous ICAM, MAPK10, KIAA0861, NUMA1 or GALE gene. Forexample, an endogenous ICAM, MAPK10, KIAA0861, NUMA1 or GALE gene (e.g.,a gene which is “transcriptionally silent,” not normally expressed, orexpressed only at very low levels) may be activated by inserting aregulatory element which is capable of promoting the expression of anormally expressed gene product in that cell. Techniques such astargeted homologous recombinations, can be used to insert theheterologous DNA as described in, e.g., Chappel, U.S. Pat. No.5,272,071; WO 91/06667, published on May 16, 1991.

Transgenic Animals

Non-human transgenic animals that express a heterologous ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptide (e.g., expressed from a ICAM,MAPK10, KIAA0861, NUMA1 or GALE nucleic acid isolated from anotherorganism) can be generated. Such animals are useful for studying thefunction and/or activity of a ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide and for identifying and/or evaluating modulators of ICAM,MAPK10, KIAA0861, NUMA1 or GALE nucleic acid and ICAM, MAPK10, KIAA0861,NUMA1 or GALE polypeptide activity. As used herein, a “transgenicanimal” is a non-human animal such as a mammal (e.g., a non-humanprimate such as chimpanzee, baboon, or macaque; an ungulate such as anequine, bovine, or caprine; or a rodent such as a rat, a mouse, or anIsraeli sand rat), a bird (e.g., a chicken or a turkey), an amphibian(e.g., a frog, salamander, or newt), or an insect (e.g., Drosophilamelanogaster), in which one or more of the cells of the animal includesa ICAM, MAPK10, KIAA0861, NUMA1 or GALE transgene. A transgene isexogenous DNA or a rearrangement (e.g., a deletion of endogenouschromosomal DNA) that is often integrated into or occurs in the genomeof cells in a transgenic animal. A transgene can direct expression of anencoded gene product in one or more cell types or tissues of thetransgenic animal, and other transgenes can reduce expression (e.g., aknockout). Thus, a transgenic animal can be one in which an endogenousICAM, MAPK10, KIAA0861, NUMA1 or GALE gene has been altered byhomologous recombination between the endogenous gene and an exogenousDNA molecule introduced into a cell of the animal (e.g., an embryoniccell of the animal) prior to development of the animal.

Intronic sequences and polyadenylation signals can also be included inthe transgene to increase expression efficiency of the transgene. One ormore tissue-specific regulatory sequences can be operably linked to aICAM, MAPK10, KIAA0861, NUMA1 or GALE transgene to direct expression ofa ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide to particular cells.A transgenic founder animal can be identified based upon the presence ofa ICAM, MAPK10, KIAA0861, NUMA1 or GALE transgene in its genome and/orexpression of ICAM, MAPK10, KIAA0861, NUMA1 or GALE mRNA in tissues orcells of the animals. A transgenic founder animal can then be used tobreed additional animals carrying the transgene. Moreover, transgenicanimals carrying a transgene encoding a ICAM, MAPK10, KIAA0861, NUMA1 orGALE polypeptide can further be bred to other transgenic animalscarrying other transgenes.

ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptides can be expressed intransgenic animals or plants by introducing, for example, a nucleic acidencoding the polypeptide into the genome of an animal. In certainembodiments the nucleic acid is placed under the control of a tissuespecific promoter, e.g., a milk or egg specific promoter, and recoveredfrom the milk or eggs produced by the animal. Also included is apopulation of cells from a transgenic animal.

ICAM, MAPK10, KIAA0861, NUMA1 and GALE Polypeptides

Featured herein are isolated ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptides, which include polypeptides having amino acid sequences setforth in SEQ ID NO: 13-18, and substantially identical polypeptidesthereof. Such polypeptides sometimes are proteins or peptides. A ICAM,MAPK10, KIAA0861, NUMA1 or GALE polypeptide is a polypeptide encoded bya ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acid, where one nucleicacid can encode one or more different polypeptides. An “isolated” or“purified” polypeptide or protein is substantially free of cellularmaterial or other contaminating proteins from the cell or tissue sourcefrom which the protein is derived, or substantially free from chemicalprecursors or other chemicals when chemically synthesized. In oneembodiment, the language “substantially free” means preparation of aICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide or ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptide variant having less than about 30%,20%, 10% and sometimes 5% (by dry weight), of non-ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptide (also referred to herein as a“contaminating protein”), or of chemical precursors or non-ICAM, MAPK10,KIAA0861, NUMA1 or GALE chemicals. When the ICAM, MAPK10, KIAA0861,NUMA1 or GALE polypeptide or a biologically active portion thereof isrecombinantly produced, it is also often substantially free of culturemedium, specifically, where culture medium represents less than about20%, sometimes less than about 10%, and often less than about 5% of thevolume of the polypeptide preparation. Isolated or purified ICAM,MAPK10, KIAA0861, NUMA1 or GALE polypeptide preparations are sometimes0.01 milligrams or more or 0.1 milligrams or more, and often 1.0milligrams or more and 10 milligrams or more in dry weight. In specificembodiments, a polypeptide comprises a leucine at amino acid position359 in SEQ ID NO: 17, a leucine at amino acid position 378 in SEQ ID NO:17, or an alanine at amino acid position 857 in SEQ ID NO: 17, or aICAM5 polypeptide comprises a proline at amino acid position 352 or analanine at amino acid position 348 in SEQ ID NO: 15.

In another aspect, featured herein are ICAM, MAPK10, KIAA0861, NUMA1 orGALE polypeptides and biologically active or antigenic fragments thereofthat are useful as reagents or targets in assays applicable toprevention, treatment or diagnosis of breast cancer. In anotherembodiment, provided herein are ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptides having a ICAM, MAPK10, KIAA0861, NUMA1 or GALE activity oractivities.

Further included herein are ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide fragments. The polypeptide fragment may be a domain or partof a domain of a ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide. Thepolypeptide fragment is often 50 or fewer, 100 or fewer, or 200 or feweramino acids in length, and is sometimes 300, 400, 500, 600, 700, or 900or fewer amino acids in length. In certain embodiments, the polypeptidefragment comprises, consists essentially of, or consists of, at least 6consecutive amino acids and not more than 1211 consecutive amino acidsof SEQ ID NO: 13-18, or the polypeptide fragment comprises, consistsessentially of, or consists of, at least 6 consecutive amino acids andnot more than 543 consecutive amino acids of SEQ ID NO: 13-18.

ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptides described herein canbe used as immunogens to produce anti-ICAM, MAPK10, KIAA0861, NUMA1 orGALE antibodies in a subject, to purify ICAM, MAPK10, KIAA0861, NUMA1 orGALE ligands or binding partners, and in screening assays to identifymolecules which inhibit or enhance the interaction of ICAM, MAPK10,KIAA0861, NUMA1 or GALE with a ICAM, MAPK10, KIAA0861, NUMA1 or GALEsubstrate. Full-length ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptides and polynucleotides encoding the same may be specificallysubstituted for a ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptidefragment or polynucleotide encoding the same in any embodiment describedherein.

Substantially identical polypeptides may depart from the amino acidsequences set forth in SEQ ID NO: 13-18 in different manners. Forexample, conservative amino acid modifications may be introduced at oneor more positions in the amino acid sequences of SEQ ID NO: 13-18. A“conservative amino acid substitution” is one in which the amino acid isreplaced by another amino acid having a similar structure and/orchemical function. Families of amino acid residues having similarstructures and functions are well known. These families include aminoacids with basic side chains (e.g., lysine, arginine, histidine), acidicside chains (e.g., aspartic acid, glutamic acid), uncharged polar sidechains (e.g., glycine, asparagine, glutamine, serine, threonine,tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine,leucine, isoleucine, proline, phenylalanine, methionine, tryptophan),beta-branched side chains (e.g., threonine, valine, isoleucine) andaromatic side chains (e.g., tyrosine, phenylalanine, tryptophan,histidine). Also, essential and non-essential amino acids may bereplaced. A “non-essential” amino acid is one that can be alteredwithout abolishing or substantially altering the biological function ofa ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide, whereas altering an“essential” amino acid abolishes or substantially alters the biologicalfunction of a ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide. Aminoacids that are conserved among ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptides are typically essential amino acids.

Also, ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptides and polypeptidevariants may exist as chimeric or fusion polypeptides. As used herein, aICAM, MAPK10, KIAA0861, NUMA1 or GALE “chimeric polypeptide” or “fusionpolypeptide” includes a ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide linked to a non-ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide. A “non-ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide”refers to a polypeptide having an amino acid sequence corresponding to apolypeptide which is not substantially identical to the ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptide, which includes, for example, apolypeptide that is different from the ICAM, MAPK10, KIAA0861, NUMA1 orGALE polypeptide and derived from the same or a different organism. TheICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide in the fusionpolypeptide can correspond to an entire or nearly entire ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptide or a fragment thereof. The non-ICAM,MAPK10, KIAA0861, NUMA1 or GALE polypeptide can be fused to theN-terminus or C-terminus of the ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide.

Fusion polypeptides can include a moiety having high affinity for aligand. For example, the fusion polypeptide can be a GST-ICAM, MAPK10,KIAA0861, NUMA1 or GALE fusion polypeptide in which the ICAM, MAPK10,KIAA0861, NUMA1 or GALE sequences are fused to the C-terminus of the GSTsequences, or a polyhistidine-ICAM, MAPK10, KIAA0861, NUMA1 or GALEfusion polypeptide in which the ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide is fused at the N- or C-terminus to a string of histidineresidues. Such fusion polypeptides can facilitate purification ofrecombinant ICAM, MAPK10, KIAA0861, NUMA1 or GALE. Expression vectorsare commercially available that already encode a fusion moiety (e.g., aGST polypeptide), and a ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleicacid can be cloned into an expression vector such that the fusion moietyis linked in-frame to the ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide. Further, the fusion polypeptide can be a ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptide containing a heterologous signalsequence at its N-terminus. In certain host cells (e.g., mammalian hostcells), expression, secretion, cellular internalization, and cellularlocalization of a ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide canbe increased through use of a heterologous signal sequence. Fusionpolypeptides can also include all or a part of a serum polypeptide(e.g., an IgG constant region or human serum albumin).

ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptides or fragments thereofcan be incorporated into pharmaceutical compositions and administered toa subject in vivo. Administration of these ICAM, MAPK10, KIAA0861, NUMA1or GALE polypeptides can be used to affect the bioavailability of aICAM, MAPK10, KIAA0861, NUMA1 or GALE substrate and may effectivelyincrease or decrease ICAM, MAPK10, KIAA0861, NUMA1 or GALE biologicalactivity in a cell or effectively supplement dysfunctional orhyperactive ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide. ICAM,MAPK10, KIAA0861, NUMA1 or GALE fusion polypeptides may be usefultherapeutically for the treatment of disorders caused by, for example,(i) aberrant modification or mutation of a gene encoding a ICAA, MAPK10,KIAA0861, NUMA1 or GALE polypeptide; (ii) mis-regulation of the ICAA,MAPK10, KIAA0861, NUMA1 or GALE gene; and (iii) aberrantpost-translational modification of a ICAA, MAPK10, KIAA0861, NUMA1 orGALE polypeptide. Also, ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptides can be used as immunogens to produce anti-ICAM, MAPK10,KIAA0861, NUMA1 or GALE antibodies in a subject, to purify ICAM, MAPK10,KIAA0861, NUMA1 or GALE ligands or binding partners, and in screeningassays to identify molecules which inhibit or enhance the interaction ofICAM, MAPK10, KIAA0861, NUMA1 or GALE with a ICAA, MAPK10, KIAA0861,NUMA1 or GALE substrate.

In addition, polypeptides can be chemically synthesized using techniquesknown in the art (See, e.g., Creighton, 1983 Proteins. New York, N.Y.:W. H. Freeman and Company; and Hunkapiller et a., (1984) Nature July12-18;310(5973):105-11). For example, a relative short polypeptidefragment can be synthesized by use of a peptide synthesizer.Furthermore, if desired, non-classical amino acids or chemical aminoacid analogs can be introduced as a substitution or addition into thefragment sequence. Non-classical amino acids include, but are notlimited to, to the D-isomers of the common amino acids,2,4-diaminobutyric acid, a-amino isobutyric acid, 4-aminobutyric acid,Abu, 2-amino butyric acid, g-Abu, e-Ahx, 6-amino hexanoic acid, Aib,2-amino isobutyric acid, 3-amino propionic acid, ornithine, norleucine,norvaline, hydroxyproline, sarcosine, citrulline, homocitrulline,cysteic acid, t-butylglycine, t-butylalanine, phenylglycine,cyclohexylalanine, b-alanine, fluoroamino acids, designer amino acidssuch as b-methyl amino acids, Ca-methyl amino acids, Na-methyl aminoacids, and amino acid analogs in general. Furthermore, the amino acidcan be D (dextrorotary) or L (levorotary).

Also included are polypeptide fragments which are differentiallymodified during or after translation, e.g., by glycosylation,acetylation, phosphorylation, amidation, derivatization by knownprotecting/blocking groups, proteolytic cleavage, linkage to an antibodymolecule or other cellular ligand, and the like. Any of numerouschemical modifications may be carried out by known techniques, includingbut not limited, to specific chemical cleavage by cyanogen bromide,trypsin, chymotrypsin, papain, V8 protease, NaBH₄; acetylation,formylation, oxidation, reduction; metabolic synthesis in the presenceof tunicamycin; and the like.

Additional post-translational modifications include, for example,N-linked or O-linked carbohydrate chains, processing of N-terminal orC-terminal ends), attachment of chemical moieties to the amino acidbackbone, chemical modifications of N-linked or O-linked carbohydratechains, and addition or deletion of an N-terminal methionine residue asa result of prokaryotic host cell expression. The polypeptide fragmentsmay also be modified with a detectable label, such as an enzymatic,fluorescent, isotopic or affinity label to allow for detection andisolation of the polypeptide.

Also provided are chemically modified polypeptide derivatives that mayprovide additional advantages such as increased solubility, stabilityand circulating time of the polypeptide, or decreased immunogenicity.See U.S. Pat. No. 4,179,337. The chemical moieties for derivitizationmay be selected from water soluble polymers such as polyethylene glycol,ethylene glycol/propylene glycol copolymers, carboxymethylcellulose,dextran, polyvinyl alcohol and the like. The polypeptides may bemodified at random positions within the molecule, or at predeterminedpositions within the molecule and may include one, two, three or moreattached chemical moieties.

The polymer may be of any molecular weight, and may be branched orunbranched. For polyethylene glycol, the molecular weight is betweenabout 1 kDa and about 100 kDa (the term “about” indicating that inpreparations of polyethylene glycol, some molecules will weigh more,some less, than the stated molecular weight) for ease in handling andmanufacturing. Other sizes may be used, depending on the desiredtherapeutic profile (e.g., the duration of sustained release desired,the effects, if any on biological activity, the ease in handling, thedegree or lack of antigenicity and other known effects of thepolyethylene glycol to a therapeutic protein or analog).

The polyethylene glycol molecules (or other chemical moieties) should beattached to the polypeptide with consideration of effects on functionalor antigenic domains of the polypeptide. There are a number ofattachment methods available to those skilled in the art, e.g., EP 0 401384, herein incorporated by reference (coupling PEG to G-CSF), see alsoMalik et al. (1992) Exp Hematol. September;20(8):1028-35, reportingpegylation of GM-CSF using tresyl chloride). For example, polyethyleneglycol may be covalently bound through amino acid residues via areactive group, such as, a free amino or carboxyl group. Reactive groupsare those to which an activated polyethylene glycol molecule may bebound. The amino acid residues having a free amino group may includelysine residues and the N-terminal amino acid residues; those having afree carboxyl group may include aspartic acid residues, glutamic acidresidues and the C-terminal amino acid residue. Sulfhydryl groups mayalso be used as a reactive group for attaching the polyethylene glycolmolecules. A polymer sometimes is attached at an amino group, such asattachment at the N-terminus or lysine group.

One may specifically desire proteins chemically modified at theN-terminus. Using polyethylene glycol as an illustration of the presentcomposition, one may select from a variety of polyethylene glycolmolecules (by molecular weight, branching, and the like), the proportionof polyethylene glycol molecules to protein (polypeptide) molecules inthe reaction mix, the type of pegylation reaction to be performed, andthe method of obtaining the selected N-terminally pegylated protein. Themethod of obtaining the N-terminally pegylated preparation (i.e.,separating this moiety from other monopegylated moieties if necessary)may be by purification of the N-terminally pegylated material from apopulation of pegylated protein molecules. Selective proteins chemicallymodified at the N-terminus may be accomplished by reductive alkylation,which exploits differential reactivity of different types of primaryamino groups (lysine versus the N-terminal) available for derivatizationin a particular protein. Under the appropriate reaction conditions,substantially selective derivatization of the protein at the N-terminuswith a carbonyl group containing polymer is achieved.

Substantially Identical Nucleic Acids and Polypeptides

Nucleotide sequences and polypeptide sequences that are substantiallyidentical to a ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequenceand the ICAA, MAPK10, KIAA0861, NUMA1 or GALE polypeptide sequencesencoded by those nucleotide sequences are included herein. The term“substantially identical” as used herein refers to two or more nucleicacids or polypeptides sharing one or more identical nucleotide sequencesor polypeptide sequences, respectively. Included are nucleotidesequences or polypeptide sequences that are 55% or more, 60% or more,65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% ormore, 95% or more (each often within a 1%, 2%, 3% or 4% variability) ormore identical to the nucleotide sequences in SEQ ID NO: 1-12 or theencoded ICAA, MAPK10, KIAA0861, NUMA1 or GALE polypeptide amino acidsequences. One test for determining whether two nucleic acids aresubstantially identical is to determine the percent of identicalnucleotide sequences or polypeptide sequences shared between the nucleicacids or polypeptides.

Calculations of sequence identity are often performed as follows.Sequences are aligned for optimal comparison purposes (e.g., gaps can beintroduced in one or both of a first and a second amino acid or nucleicacid sequence for optimal alignment and non-homologous sequences can bedisregarded for comparison purposes). The length of a reference sequencealigned for comparison purposes is sometimes 30% or more, 40% or more,50% or more, often 60% or more, and more often 70% or more, 80% or more,90% or more, 90% or more, or 100% of the length of the referencesequence. The nucleotides or amino acids at corresponding nucleotide orpolypeptide positions, respectively, are then compared among the twosequences. When a position in the first sequence is occupied by the samenucleotide or amino acid as the corresponding position in the secondsequence, the nucleotides or amino acids are deemed to be identical atthat position. The percent identity between the two sequences is afunction of the number of identical positions shared by the sequences,taking into account the number of gaps, and the length of each gap,introduced for optimal alignment of the two sequences.

Comparison of sequences and determination of percent identity betweentwo sequences can be accomplished using a mathematical algorithm.Percent identity between two amino acid or nucleotide sequences can bedetermined using the algorithm of Meyers & Miller, CABIOS 4: 11-17(1989), which has been incorporated into the ALIGN program (version2.0), using a PAM120 weight residue table, a gap length penalty of 12and a gap penalty of 4. Also, percent identity between two amino acidsequences can be determined using the Needleman & Wunsch, J. Mol. Biol.48: 444-453 (1970) algorithm which has been incorporated into the GAPprogram in the GCG software package (available at the world wide webaddress gcg.com), using either a Blossum 62 matrix or a PAM250 matrix,and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1,2, 3, 4, 5, or 6. Percent identity between two nucleotide sequences canbe determined using the GAP program in the GCG software package(available at the world wide web address gcg.com), using a NWSgapdna.CMPmatrix and a gap weight of 40, 50, 60, 70, or 80 and a length weight of1, 2, 3, 4, 5, or 6. A set of parameters often used is a Blossum 62scoring matrix with a gap open penalty of 12, a gap extend penalty of 4,and a frameshift gap penalty of 5.

Another manner for determining if two nucleic acids are substantiallyidentical is to assess whether a polynucleotide homologous to onenucleic acid will hybridize to the other nucleic acid under stringentconditions. As use herein, the term “stringent conditions” refers toconditions for hybridization and washing. Stringent conditions are knownto those skilled in the art and can be found in Current Protocols inMolecular Biology, John Wiley & Sons, N.Y., 6.3.1-6.3.6 (1989). Aqueousand non-aqueous methods are described in that reference and either canbe used. An example of stringent hybridization conditions ishybridization in 6× sodium chloride/sodium citrate (SSC) at about 45°C., followed by one or more washes in 0.2×SSC, 0.1% SDS at 50° C.Another example of stringent hybridization conditions are hybridizationin 6× sodium chloride/sodium citrate (SSC) at about 45° C., followed byone or more washes in 0.2×SSC, 0.1% SDS at 55° C. A further example ofstringent hybridization conditions is hybridization in 6× sodiumchloride/sodium citrate (SSC) at about 45° C., followed by one or morewashes in 0.2×SSC, 0.1% SDS at 60° C. Often, stringent hybridizationconditions are hybridization in 6× sodium chloride/sodium citrate (SSC)at about 45° C., followed by one or more washes in 0.2×SSC, 0.1% SDS at65° C. More often, stringency conditions are 0.5M sodium phosphate, 7%SDS at 65° C., followed by one or more washes at 0.2×SSC, 1% SDS at 65°C.

An example of a substantially identical nucleotide sequence to a ICAM,MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequence is one that has adifferent nucleotide sequence but still encodes the same polypeptidesequence encoded by the ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleotidesequence. Another example is a nucleotide sequence that encodes apolypeptide having a polypeptide sequence that is more than 70% or moreidentical to, sometimes 75% or more, 80% or more, or 85% or moreidentical to, and often 90% or more and 95% or more identical to apolypeptide sequence encoded by a ICAM, MAPK10, KIAA0861, NUMA1 or GALEnucleotide sequence.

ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequences and ICAM,MAPK10, KIAA0861, NUMA1 or GALE amino acid sequences can be used as“query sequences” to perform a search against public databases toidentify other family members or related sequences, for example. Suchsearches can be performed using the NBLAST and XBLAST programs (version2.0) of Altschul et al., J. Mol. Biol. 215: 403-10 (1990). BLASTnucleotide searches can be performed with the NBLAST program, score=100,wordlength=12 to obtain nucleotide sequences homologous to nucleotidesequences from SEQ ID NO: 1-12. BLAST polypeptide searches can beperformed with the XBLAST program, score=50, wordlength=3 to obtainamino acid sequences homologous to polypeptides encoded by a ICAM,MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequence. To obtain gappedalignments for comparison purposes, Gapped BLAST can be utilized asdescribed in Altschul et al., Nucleic Acids Res. 25(17): 3389-3402(1997). When utilizing BLAST and Gapped BLAST programs, defaultparameters of the respective programs (e.g., XBLAST and NBLAST) can beused (see the world wide web address ncbi.nlm.nih.gov).

A nucleic acid that is substantially identical to a ICAM, MAPK10,KIAA0861, NUMA1 or GALE nucleotide sequence may include polymorphicsites at positions equivalent to those described herein when thesequences are aligned. For example, using the alignment proceduresdescribed herein, SNPs in a sequence substantially identical to asequence in SEQ ID NO: 1-12 can be identified at nucleotide positionsthat match (i.e., align) with nucleotides at SNP positions in thenucleotide sequence of SEQ ID NO: 1-12. Also, where a polymorphicvariation results in an insertion or deletion, insertion or deletion ofa nucleotide sequence from a reference sequence can change the relativepositions of other polymorphic sites in the nucleotide sequence.

Substantially identical nucleotide and polypeptide sequences includethose that are naturally occurring, such as allelic variants (samelocus), splice variants, homologs (different locus), and orthologs(different organism) or can be non-naturally occurring. Non-naturallyoccurring variants can be generated by mutagenesis techniques, includingthose applied to polynucleotides, cells, or organisms. The variants cancontain nucleotide substitutions, deletions, inversions and insertions.Variation can occur in either or both the coding and non-coding regions.The variations can produce both conservative and non-conservative aminoacid substitutions (as compared in the encoded product). Orthologs,homologs, allelic variants, and splice variants can be identified usingmethods known in the art. These variants normally comprise a nucleotidesequence encoding a polypeptide that is 50% or more, about 55% or more,often about 70-75% or more, more often about 80-85% or more, andtypically about 90-95% or more identical to the amino acid sequences oftarget polypeptides or a fragment thereof. Such nucleic acid moleculesreadily can be identified as being able to hybridize under stringentconditions to a nucleotide sequence in SEQ ID NO: 1-12 or a fragmentthereof. Nucleic acid molecules corresponding to orthologs, homologs,and allelic variants of a nucleotide sequence in SEQ ID NO: 1-12 can beidentified by mapping the sequence to the same chromosome or locus asthe nucleotide sequence in SEQ ID NO: 1-12.

Also, substantially identical nucleotide sequences may include codonsthat are altered with respect to the naturally occurring sequence forenhancing expression of a target polypeptide in a particular expressionsystem. For example, the nucleic acid can be one in which one or morecodons are altered, and often 10% or more or 20% or more of the codonsare altered for optimized expression in bacteria (e.g., E. coli.), yeast(e.g., S. cervesiae), human (e.g., 293 cells), insect, or rodent (e.g.,hamster) cells.

Methods for Identifying Subjects at Risk of Breast Cancer and BreastCancer Risk in a Subject

Methods for prognosing and diagnosing breast cancer in subjects areprovided herein. These methods include detecting the presence or absenceof one or more polymorphic variations associated with breast cancer in anucleotide sequence set forth in SEQ ID NO: 1-5, or substantiallyidentical sequence thereof, in a sample from a subject, where thepresence of a polymorphic variant is indicative of a risk of breastcancer.

Thus, featured herein is a method for detecting a subject at risk ofbreast cancer or the risk of breast cancer in a subject, which comprisesdetecting the presence or absence of a polymorphic variation associatedwith breast cancer at a polymorphic site in a nucleic acid sample from asubject, where the nucleotide sequence comprises a polynucleotidesequence selected from the group consisting of: (a) a nucleotidesequence set forth in SEQ ID NO: 1-5; (b) a nucleotide sequence whichencodes a polypeptide having an amino acid sequence encoded by anucleotide sequence in SEQ ID NO: 1-5; (c) a nucleotide sequence whichencodes a polypeptide that is 90% or more identical to an amino acidsequence encoded by a nucleotide sequence in SEQ ID NO: 1-5 or anucleotide sequence about 90% or more identical to the nucleotidesequence set forth in SEQ ID NO: 1-5; and (d) a fragment of a nucleotidesequence of (a), (b), or (c), often a fragment that includes apolymorphic site associated with breast cancer; whereby the presence ofthe polymorphic variation is indicative of a risk of breast cancer inthe subject.

In certain embodiments, determining the presence of a combination of twoor more polymorphic variants associated with breast cancer in one ormore genetic loci (e.g., one or more genes) of the sample is determinedto identify, quantify and/or estimate, risk of breast cancer. The riskoften is the probability of having or developing breast cancer. The risksometimes is expressed as a relative risk with respect to a populationaverage risk of breast cancer, and sometimes is expressed as a relativerisk with resepect to the lowest risk group. Such relative riskassessments often are based upon penetrance values determined bystatistical methods (see e.g., statistical analysis Example 9), and areparticularly useful to clinicians and insurance companies for assessingrisk of breast cancer (e.g., a clinician can target appropriatedetection, prevention and therapeutic regimens to a patient afterdetermining the patient's risk of breast cancer, and an insurancecompany can fine tune actuarial tables based upon population genotypeassessments of breast cancer risk). Risk of breast cancer sometimes isexpressed as an odds ratio, which is the odds of a particular personhaving a genotype has or will develop breast cancer with respect toanother genotype group (e.g., the most disease protective genotype orpopulation average). In related embodiments, the determination isutilized to identify a subject at risk of breast cancer. In anembodiment, two or more polymorphic variations are detected in two ormore regions in human genomic DNA associated with increased risk ofbreast cancer, such as regions selected from the group of lociconsisting of ICAM, MAPK10, KIAA0861, NUMA1 and GALE, for example. Incertain embodiments, 3 or more, or 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19 or 20 or more polymorphic variants are detected inthe sample. In specific embodiments, polymorphic variants are detectedin ICAM, MAPK10, KIAA0861, NUMA1 and GALE loci, such as at positions44247 in SEQ ID NO: 1 (ICAM), position 36424 in SEQ ID NO: 2 (MAPK10),position 48563 in SEQ ID NO: 3 (KIAA0861), position 49002 in SEQ ID NO:4 (NUMA1) and position 174 in SEQ ID NO: 5 (GALE), for example. Incertain embodiments, polymorphic variants are detected at other geneticloci (e.g., the polymorphic variants can be detected in ICAM, MAPK10,KIAA0861, NUMA1 and/or GALE in addition to other loci or only in otherloci), where the other loci include but are not limited to RAD21, KLF12,SPUVE, GRIN3A, PFTK1, SERPINA5, LOC115209, HRMT1L3, DLG1, KIAA0783,DPF3, CENPC1, GP6, LAMA4, CHCB/C20ORF154, LOC338749, and TTN/LOC351327,which are described in concurrently-filed patent applications havingattorney docket numbers 524592006700, 524592006800, 524592007000,524592007100 and 524592007200, and any others disclosed in patentapplication No. 60/429,136 (filed Nov. 25, 2002) 60/490,234 (filed Jul.24, 2003).

A risk of developing aggressive forms of breast cancer likely tometastasize or invade surrounding tissues (e.g., Stage IIIA, IIIB, andIV breast cancers), and subjects at risk of developing aggressive formsof breast cancer also may be identified by the methods described herein.These methods include collecting phenotype information from subjectshaving breast cancer, which includes the stage of progression of thebreast cancer, and performing a secondary phenotype analysis to detectthe presence or absence of one or more polymorphic variations associatedwith a particular stage form of breast cancer. Thus, detecting thepresence or absence of one or more polymorphic variations in a ICAM,MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequence associated with alate stage form of breast cancer often is prognostic and/or diagnosticof an aggressive form of the cancer.

Results from prognostic tests may be combined with other test results todiagnose breast cancer. For example, prognostic results may be gathered,a patient sample may be ordered based on a determined predisposition tobreast cancer, the patient sample is analyzed, and the results of theanalysis may be utilized to diagnose breast cancer. Also breast cancerdiagnostic methods can be developed from studies used to generateprognostic/diagnostic methods in which populations are stratified intosubpopulations having different progressions of breast cancer. Inanother embodiment, prognostic results may be gathered; a patient's riskfactors for developing breast cancer analyzed (e.g., age, race, familyhistory, age of first menstrual cycle, age at birth of first child); anda patient sample may be ordered based on a determined predisposition tobreast cancer. In an alternative embodiment, the results frompredisposition analyses described herein may be combined with other testresults indicative of breast cancer, which were previously,concurrently, or subsequently gathered with respect to thepredisposition testing. In these embodiments, the combination of theprognostic test results with other test results can be probative ofbreast cancer, and the combination can be utilized as a breast cancerdiagnostic. The results of any test indicative of breast cancer known inthe art may be combined with the methods described herein. Examples ofsuch tests are mammography (e.g., a more frequent and/or earliermammography regimen may be prescribed); breast biopsy and optionally abiopsy from another tissue; breast ultrasound and optionally anultrasound analysis of another tissue; breast magnetic resonance imaging(MRI) and optionally an MRI analysis of another tissue; electricalimpedance (T-scan) analysis of breast and optionally of another tissue;ductal lavage; nuclear medicine analysis (e.g., scintimammography);BRCA1 and/or BRCA2 sequence analysis results; and thermal imaging of thebreast and optionally of another tissue. Testing may be performed ontissue other than breast to diagnose the occurrence of metastasis (e.g.,testing of the lymph node).

Risk of breast cancer sometimes is expressed as a probability, such asan odds ratio, percentage, or risk factor. The risk is based upon thepresence or absence of one or more polymorphic variants describedherein, and also may be based in part upon phenotypic traits of theindividual being tested. Methods for calculating predispositions basedupon patient data are well known (see, e.g., Agresti, Categorical DataAnalysis, 2nd Ed. 2002. Wiley). Allelotyping and genotyping analyses maybe carried out in populations other than those exemplified herein toenhance the predictive power of the prognostic method. These furtheranalyses are executed in view of the exemplified procedures describedherein, and may be based upon the same polymorphic variations oradditional polymorphic variations. Risk determinations for breast cancerare useful in a variety of applications. In one embodiment, breastcancer risk determinations are used by clinicians to direct appropriatedetection, preventative and treatment procedures to subjects who mostrequire these. In another embodiment, breast cancer risk determinationsare used by health insurers for preparing actuarial tables and forcalculating insurance premiums.

The nucleic acid sample typically is isolated from a biological sampleobtained from a subject. For example, nucleic acid can be isolated fromblood, saliva, sputum, urine, cell scrapings, and biopsy tissue. Thenucleic acid sample can be isolated from a biological sample usingstandard techniques, such as the technique described in Example 2. Asused herein, the term “subject” refers primarily to humans but alsorefers to other mammals such as dogs, cats, and ungulates (e.g., cattle,sheep, and swine). Subjects also include avians (e.g., chickens andturkeys), reptiles, and fish (e.g., salmon), as embodiments describedherein can be adapted to nucleic acid samples isolated from any of theseorganisms. The nucleic acid sample may be isolated from the subject andthen directly utilized in a method for determining the presence of apolymorphic variant, or alternatively, the sample may be isolated andthen stored (e.g., frozen) for a period of time before being subjectedto analysis.

The presence or absence of a polymorphic variant is determined using oneor both chromosomal complements represented in the nucleic acid sample.Determining the presence or absence of a polymorphic variant in bothchromosomal complements represented in a nucleic acid sample from asubject having a copy of each chromosome is useful for determining thezygosity of an individual for the polymorphic variant (i.e., whether theindividual is homozygous or heterozygous for the polymorphic variant).Any oligonucleotide-based diagnostic may be utilized to determinewhether a sample includes the presence or absence of a polymorphicvariant in a sample. For example, primer extension methods, ligasesequence determination methods (e.g., U.S. Pat. Nos. 5,679,524 and5,952,174, and WO 01/27326), mismatch sequence determination methods(e.g., U.S. Pat. Nos. 5,851,770; 5,958,692; 6,110,684; and 6,183,958),microarray sequence determination methods, restriction fragment lengthpolymorphism (RFLP), single strand conformation polymorphism detection(SSCP) (e.g., U.S. Pat. Nos. 5,891,625 and 6,013,499), PCR-based assays(e.g., TAQMAN® PCR System (Applied Biosystems)), and nucleotidesequencing methods may be used.

Oligonucleotide extension methods typically involve providing a pair ofoligonucleotide primers in a polymerase chain reaction (PCR) or in othernucleic acid amplification methods for the purpose of amplifying aregion from the nucleic acid sample that comprises the polymorphicvariation. One oligonucleotide primer is complementary to a region 3′ ofthe polymorphism and the other is complementary to a region 5′ of thepolymorphism. A PCR primer pair may be used in methods disclosed in U.S.Pat. Nos. 4,683,195; 4,683,202, 4,965,188; 5,656,493; 5,998,143;6,140,054; WO 01/27327; and WO 01/27329 for example. PCR primer pairsmay also be used in any commercially available machines that performPCR, such as any of the GENEAMP® Systems available from AppliedBiosystems. Also, those of ordinary skill in the art will be able todesign oligonucleotide primers based upon a nucleotide sequence setforth in SEQ ID NO: 1-5 without undue experimentation using knowledgereadily available in the art.

Also provided is an extension oligonucleotide that hybridizes to theamplified fragment adjacent to the polymorphic variation. As usedherein, the term “adjacent” refers to the 3′ end of the extensionoligonucleotide being often 1 nucleotide from the 5′ end of thepolymorphic site, and sometimes 2, 3, 4, 5, 6, 7, 8, 9, or 10nucleotides from the 5′ end of the polymorphic site, in the nucleic acidwhen the extension oligonucleotide is hybridized to the nucleic acid.The extension oligonucleotide then is extended by one or morenucleotides, and the number and/or type of nucleotides that are added tothe extension oligonucleotide determine whether the polymorphic variantis present. Oligonucleotide extension methods are disclosed, forexample, in U.S. Pat. Nos. 4,656,127; 4,851,331; 5,679,524; 5,834,189;5,876,934; 5,908,755; 5,912,118; 5,976,802; 5,981,186; 6,004,744;6,013,431; 6,017,702; 6,046,005; 6,087,095; 6,210,891; and WO 01/20039.Oligonucleotide extension methods using mass spectrometry are described,for example, in U.S. Pat. Nos. 5,547,835; 5,605,798; 5,691,141;5,849,542; 5,869,242; 5,928,906; 6,043,031; and 6,194,144, and a methodoften utilized is described herein in Example 2. Multiple extensionoligonucleotides may be utilized in one reaction, which is referred toherein as “multiplexing.”

A microarray can be utilized for determining whether a polymorphicvariant is present or absent in a nucleic acid sample. A microarray mayinclude any oligonucleotides described herein, and methods for makingand using oligonucleotide microarrays suitable for diagnostic use aredisclosed in U.S. Pat. Nos. 5,492,806; 5,525,464; 5,589,330; 5,695,940;5,849,483; 6,018,041; 6,045,996; 6,136,541; 6,142,681; 6,156,501;6,197,506; 6,223,127; 6,225,625; 6,229,911; 6,239,273; WO 00/52625; WO01/25485; and WO 01/29259. The microarray typically comprises a solidsupport and the oligonucleotides may be linked to this solid support bycovalent bonds or by non-covalent interactions. The oligonucleotides mayalso be linked to the solid support directly or by a spacer molecule. Amicroarray may comprise one or more oligonucleotides complementary to apolymorphic site set forth in SEQ ID NO: 1-5 or below.

A kit also may be utilized for determining whether a polymorphic variantis present or absent in a nucleic acid sample. A kit often comprises oneor more pairs of oligonucleotide primers useful for amplifying afragment of a ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequenceor a substantially identical sequence thereof, where the fragmentincludes a polymorphic site. The kit sometimes comprises a polymerizingagent, for example, a thermostable nucleic acid polymerase such as onedisclosed in U.S. Pat. Nos. 4,889,818 or 6,077,664. Also, the kit oftencomprises an elongation oligonucleotide that hybridizes to a ICAA,MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequence in a nucleic acidsample adjacent to the polymorphic site. Where the kit includes anelongation oligonucleotide, it also often comprises chain elongatingnucleotides, such as DATP, dTTP, dGTP, dCTP, and dITP, including analogsof DATP, dTTP, dGTP, dCTP and dITP, provided that such analogs aresubstrates for a thermostable nucleic acid polymerase and can beincorporated into a nucleic acid chain elongated from the extensionoligonucleotide. Along with chain elongating nucleotides would be one ormore chain terminating nucleotides such as ddATP, ddTTP, ddGTP, ddCTP,and the like. In an embodiment, the kit comprises one or moreoligonucleotide primer pairs, a polymerizing agent, chain elongatingnucleotides, at least one elongation oligonucleotide, and one or morechain terminating nucleotides. Kits optionally include buffers, vials,microtiter plates, and instructions for use.

An individual identified as being at risk of breast cancer may beheterozygous or homozygous with respect to the allele associated with ahigher risk of breast cancer. A subject homozygous for an alleleassociated with an increased risk of breast cancer is at a comparativelyhigh risk of breast cancer, a subject heterozygous for an alleleassociated with an increased risk of breast cancer is at a comparativelyintermediate risk of breast cancer, and a subject homozygous for anallele associated with a decreased risk of breast cancer is at acomparatively low risk of breast cancer. A genotype may be assessed fora complementary strand, such that the complementary nucleotide at aparticular position is detected.

Also featured are methods for determining risk of breast cancer and/oridentifying a subject at risk of breast cancer by contacting apolypeptide or protein encoded by a ICAA, MAPK10, KIAA0861, NUMA1 orGALE nucleotide sequence from a subject with an antibody thatspecifically binds to an epitope associated with increased risk ofbreast cancer in the polypeptide. In certain embodiments, the antibodyspecifically binds to an epitope that comprises a leucine at amino acidposition 359 in SEQ ID NO: 17, a leucine at amino acid position 378 inSEQ ID NO: 17, or an alanine at amino acid position 857 in SEQ ID NO:17, a proline at amino acid position 352 in SEQ ID NO: 15 or an alanineat amino acid position 348 in SEQ ID NO: 15.

Applications of Prognostic and Diagnostic Results to PharmacogenomicMethods

Pharmacogenomics is a discipline that involves tailoring a treatment fora subject according to the subject's genotype. For example, based uponthe outcome of a prognostic test described herein, a clinician orphysician may target pertinent information and preventative ortherapeutic treatments to a subject who would be benefited by theinformation or treatment and avoid directing such information andtreatments to a subject who would not be benefited (e.g., the treatmenthas no therapeutic effect and/or the subject experiences adverse sideeffects). As therapeutic approaches for breast cancer continue to evolveand improve, the goal of treatments for breast cancer related disordersis to intervene even before clinical signs (e.g., identification of lumpin the breast) first manifest. Thus, genetic markers associated withsusceptibility to breast cancer prove useful for early diagnosis,prevention and treatment of breast cancer.

The following is an example of a pharmacogenomic embodiment. Aparticular treatment regimen can exert a differential effect dependingupon the subject's genotype. Where a candidate therapeutic exhibits asignificant interaction with a major allele and a comparatively weakinteraction with a minor allele (e.g., an order of magnitude or greaterdifference in the interaction), such a therapeutic typically would notbe administered to a subject genotyped as being homozygous for the minorallele, and sometimes not administered to a subject genotyped as beingheterozygous for the minor allele. In another example, where a candidatetherapeutic is not significantly toxic when administered to subjects whoare homozygous for a major allele but is comparatively toxic whenadministered to subjects heterozygous or homozygous for a minor allele,the candidate therapeutic is not typically administered to subjects whoare genotyped as being heterozygous or homozygous with respect to theminor allele.

The methods described herein are applicable to pharmacogenomic methodsfor detecting, preventing, alleviating and/or treating breast cancer.For example, a nucleic acid sample from an individual may be subjectedto a genetic test described herein. Where one or more polymorphicvariations associated with increased risk of breast cancer areidentified in a subject, information for detecting, preventing ortreating breast cancer and/or one or more breast cancer detection,prevention and/or treatment regimens then may be directed to and/orprescribed to that subject.

In certain embodiments, a detection, prevenative and/or treatmentregimen is specifically prescribed and/or administered to individualswho will most benefit from it based upon their risk of developing breastcancer assessed by the methods described herein. Thus, provided aremethods for identifying a subject at risk of breast cancer and thenprescribing a detection, therapeutic or preventative regimen toindividuals identified as being at risk of breast cancer. Thus, certainembodiments are directed to methods for treating breast cancer in asubject, reducing risk of breast cancer in a subject, or early detectionof breast cancer in a subject, which comprise: detecting the presence orabsence of a polymorphic variant associated with breast cancer in anucleotide sequence in a nucleic acid sample from a subject, where thenucleotide sequence comprises a polynucleotide sequence selected fromthe group consisting of: (a) a nucleotide sequence set forth in SEQ IDNO: 1-5; (b) a nucleotide sequence which encodes a polypeptide having anamino acid sequence encoded by a nucleotide sequence in SEQ ID NO: 1-5;(c) a nucleotide sequence which encodes a polypeptide that is 90% ormore identical to an amino acid sequence encoded by a nucleotidesequence in SEQ ID NO: 1-5 or a nucleotide sequence about 90% or moreidentical to the nucleotide sequence set forth in SEQ ID NO: 1-5; and(d) a fragment of a nucleotide sequence of (a), (b), or (c), sometimescomprising a polymorphic site associated with breast cancer; andprescribing or administering a breast cancer treatment regimen,preventative regimen and/or detection regimen to a subject from whom thesample originated where the presence of one or more polymorphicvariations associated with breast cancer are detected in the nucleotidesequence. In these methods, genetic results may be utilized incombination with other test results to diagnose breast cancer asdescribed above. Other test results include but are not limited tomammography results, imaging results, biopsy results and results fromBRCA1 or BRAC2 test results, as described above.

Detection regimens include one or more mammography procedures, a regularmammography regimen (e.g., once a year, or once every six, four, threeor two months); an early mammography regimen (e.g., mammography testsare performed beginning at age 25, 30, or 35); one or more biopsyprocedures (e.g., a regular biopsy regimen beginning at age 40); breastbiopsy and biopsy from other tissue; breast ultrasound and optionallyultrasound analysis of another tissue; breast magnetic resonance imaging(MRI) and optionally MRI analysis of another tissue; electricalimpedance (T-scan) analysis of breast and optionally another tissue;ductal lavage; nuclear medicine analysis (e.g., scintimammography);BRCA1 and/or BRCA2 sequence analysis results; and/or thermal imaging ofthe breast and optionally another tissue.

Treatments sometimes are preventative (e.g., is prescribed oradministered to reduce the probability that a breast cancer associatedcondition arises or progresses), sometimes are therapeutic, andsometimes delay, alleviate or halt the progression of breast cancer. Anyknown preventative or therapeutic treatment for alleviating orpreventing the occurrence of breast cancer is prescribed and/oradministered. For example, certain preventative treatments often areprescribed to subjects having a predisposition to breast cancer andwhere the subject is not diagnosed with breast cancer or is diagnosed ashaving symptoms indicative of early stage breast cancer (e.g., stage I).For subjects not diagnosed as having breast cancer, any preventativetreatments known in the art can be prescribed and administered, whichinclude selective hormone receptor modulators (e.g., selective estrogenreceptor modulators (SERMs) such as tamoxifen, reloxifene, andtoremifene); compositions that prevent production of hormones (e.g.,aramotase inhibitors that prevent the production of estrogen in theadrenal gland, such as exemestane, letrozole, anastrozol, groserelin,and megestrol); other hormonal treatments (e.g., goserelin acetate andfulvestrant); biologic response modifiers such as antibodies (e.g.,trastuzumab (herceptin/HER2)); surgery (e.g., lumpectomy andmastectomy); drugs that delay or halt metastasis (e.g., pamidronatedisodium); and alternative/complementary medicine (e.g., acupuncture,acupressure, moxibustion, qi gong, reiki, ayurveda, vitamins, minerals,and herbs (e.g., astragalus root, burdock root, garlic, green tea, andlicorice root)).

The use of breast cancer treatments are well known in the art, andinclude surgery, chemotherapy and/or radiation therapy. Any of thetreatments may be used in combination to treat or prevent breast cancer(e.g., surgery followed by radiation therapy or chemotherapy). Examplesof chemotherapy combinations used to treat breast cancer include:cyclophosphamide (Cytoxan), methotrexate (Amethopterin, Mexate, Folex),and fluorouracil (Fluorouracil, 5-Fu, Adrucil), which is referred to asCMF; cyclophosphamide, doxorubicin (Adriamycin), and fluorouracil, whichis referred to as CAF; and doxorubicin (Adriamycin) andcyclophosphamide, which is referred to as AC.

As breast cancer preventative and treatment information can bespecifically targeted to subjects in need thereof (e.g., those at riskof developing breast cancer or those that have early signs of breastcancer), provided herein is a method for preventing or reducing the riskof developing breast cancer in a subject, which comprises: (a) detectingthe presence or absence of a polymorphic variation associated withbreast cancer at a polymorphic site in a nucleotide sequence in anucleic acid sample from a subject; (b) identifying a subject with apredisposition to breast cancer, whereby the presence of the polymorphicvariation is indicative of a predisposition to breast cancer in thesubject; and (c) if such a predisposition is identified, providing thesubject with information about methods or products to prevent or reducebreast cancer or to delay the onset of breast cancer. Also provided is amethod of targeting information or advertising to a subpopulation of ahuman population based on the subpopulation being geneticallypredisposed to a disease or condition, which comprises: (a) detectingthe presence or absence of a polymorphic variation associated withbreast cancer at a polymorphic site in a nucleotide sequence in anucleic acid sample from a subject; (b) identifying the subpopulation ofsubjects in which the polymorphic variation is associated with breastcancer; and (c) providing information only to the subpopulation ofsubjects about a particular product which may be obtained and consumedor applied by the subject to help prevent or delay onset of the diseaseor condition.

Pharmacogenomics methods also may be used to analyze and predict aresponse to a breast cancer treatment or a drug. For example, ifpharmacogenomics analysis indicates a likelihood that an individual willrespond positively to a breast cancer treatment with a particular drug,the drug may be administered to the individual. Conversely, if theanalysis indicates that an individual is likely to respond negatively totreatment with a particular drug, an alternative course of treatment maybe prescribed. A negative response may be defined as either the absenceof an efficacious response or the presence of toxic side effects. Theresponse to a therapeutic treatment can be predicted in a backgroundstudy in which subjects in any of the following populations aregenotyped: a population that responds favorably to a treatment regimen,a population that does not respond significantly to a treatment regimen,and a population that responds adversely to a treatment regiment (e.g.,exhibits one or more side effects). These populations are provided asexamples and other populations and subpopulations may be analyzed. Basedupon the results of these analyses, a subject is genotyped to predictwhether he or she will respond favorably to a treatment regimen, notrespond significantly to a treatment regimen, or respond adversely to atreatment regimen.

The methods described herein also are applicable to clinical drugtrials. One or more polymorphic variants indicative of response to anagent for treating breast cancer or to side effects to an agent fortreating breast cancer may be identified using the methods describedherein. Thereafter, potential participants in clinical trials of such anagent may be screened to identify those individuals most likely torespond favorably to the drug and exclude those likely to experienceside effects. In that way, the effectiveness of drug treatment may bemeasured in individuals who respond positively to the drug, withoutlowering the measurement as a result of the inclusion of individuals whoare unlikely to respond positively in the study and without riskingundesirable safety problems. In certain embodiments, the agent fortreating breast cancer described herein targets ICAM, MAPK10, KIAA0861,NUMA1 or GALE or a target in the ICAM, MAPK10, KIAA0861, NUMA1 or GALEpathway.

Thus, another embodiment is a method of selecting an individual forinclusion in a clinical trial of a treatment or drug comprising thesteps of: (a) obtaining a nucleic acid sample from an individual; (b)determining the identity of a polymorphic variation which is associatedwith a positive response to the treatment or the drug, or at least onepolymorphic variation which is associated with a negative response tothe treatment or the drug in the nucleic acid sample, and (c) includingthe individual in the clinical trial if the nucleic acid sample containssaid polymorphic variation associated with a positive response to thetreatment or the drug or if the nucleic acid sample lacks saidpolymorphic variation associated with a negative response to thetreatment or the drug. In addition, the methods for selecting anindividual for inclusion in a clinical trial of a treatment or drugencompass methods with any further limitation described in thisdisclosure, or those following, specified alone or in any combination.The polymorphic variation may be in a sequence selected individually orin any combination from the group consisting of (i) a polynucleotidesequence set forth in SEQ ID NO: 1-5; (ii) a polynucleotide sequencethat is 90% or more identical to a nucleotide sequence set forth in SEQID NO: 1-5; (iii) a polynucleotide sequence that encodes a polypeptidehaving an amino acid sequence identical to or 90% or more identical toan amino acid sequence encoded by a nucleotide sequence set forth in SEQID NO: 1-5; and (iv) a fragment of a polynucleotide sequence of (i),(ii), or (iii) comprising the polymorphic site. The including step (c)optionally comprises administering the drug or the treatment to theindividual if the nucleic acid sample contains the polymorphic variationassociated with a positive response to the treatment or the drug and thenucleic acid sample lacks said biallelic marker associated with anegative response to the treatment or the drug.

Also provided herein is a method of partnering between adiagnostic/prognostic testing provider and a provider of a consumableproduct, which comprises: (a) the diagnostic/prognostic testing providerdetects the presence or absence of a polymorphic variation associatedwith breast cancer at a polymorphic site in a nucleotide sequence in anucleic acid sample from a subject; (b) the diagnostic/prognostictesting provider identifies the subpopulation of subjects in which thepolymorphic variation is associated with breast cancer; (c) thediagnostic/prognostic testing provider forwards information to thesubpopulation of subjects about a particular product which may beobtained and consumed or applied by the subject to help prevent or delayonset of the disease or condition; and (d) the provider of a consumableproduct forwards to the diagnostic test provider a fee every time thediagnostic/prognostic test provider forwards information to the subjectas set forth in step (c) above.

Compositions Comprising Breast Cancer-Directed Molecules

Featured herein is a composition comprising a breast cancer cell and oneor more molecules specifically directed and targeted to a nucleic acidcomprising a ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequenceor a ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide. Such directedmolecules include, but are not limited to, a compound that binds to aICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acid or a ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptide; a RNAi or siRNA molecule having astrand complementary to a ICAM1 MAPK10, KIAA0861, NUMA1 or GALEnucleotide sequence; an antisense nucleic acid complementary to an RNAencoded by a ICAM, MAPK10, KIAA0861, NUMA1 or GALE DNA sequence; aribozyme that hybridizes to a ICAM1 MAPK10, KIAA0861, NUMA1 or GALEnucleotide sequence; a nucleic acid aptamer that specifically binds aICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide; and an antibody thatspecifically binds to a ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide or binds to a ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleicacid. In certain embodiments, the antibody specifically binds to anepitope that comprises a leucine at amino acid position 359 in SEQ IDNO: 17, a leucine at amino acid position 378 in SEQ ID NO: 17, or analanine at amino acid position 857 in SEQ ID NO: 17, a proline at aminoacid position 352 in SEQ ID NO: 15 or an alanine at amino acid position348 in SEQ ID NO: 15. In specific embodiments, the breast cancerdirected molecule interacts with a ICAM, MAPK10, KIAA0861, NUMA1 or GALEnucleic acid or polypeptide variant associated with breast cancer. Inother embodiments, the breast cancer directed molecule interacts with apolypeptide involved in the ICAM, MAPK10, KIAA0861, NUMA1 or GALE signalpathway, or a nucleic acid encoding such a polypeptide. Polypeptidesinvolved in the ICAM, MAPK10, KIAA0861, NUMA1 or GALE signal pathway arediscussed herein.

Compositions sometimes include an adjuvant known to stimulate an immuneresponse, and in certain embodiments, an adjuvant that stimulates aT-cell lymphocyte response. Adjuvants are known, including but notlimited to an aluminum adjuvant (e.g., aluminum hydroxide); a cytokineadjuvant or adjuvant that stimulates a cytokine response (e.g.,interleukin (IL)-12 and/or γ-interferon cytokines); a Freund-typemineral oil adjuvant emulsion (e.g., Freund's complete or incompleteadjuvant); a synthetic lipoid compound; a copolymer adjuvant (e.g.,TitreMax); a saponin; Quil A; a liposome; an oil-in-water emulsion(e.g., an emulsion stabilized by Tween 80 and pluronicpolyoxyethlene/polyoxypropylene block copolymer (Syntex AdjuvantFormulation); TitreMax; detoxified endotoxin (MPL) and mycobacterialcell wall components (TDW, CWS) in 2% squalene (Ribi Adjuvant System));a muramyl dipeptide; an immune-stimulating complex (ISCOM, e.g., anAg-modified saponin/cholesterol micelle that forms stable cage-likestructure); an aqueous phase adjuvant that does not have a depot effect(e.g., Gerbu adjuvant); a carbohydrate polymer (e.g., AdjuPrime);L-tyrosine; a manide-oleate compound (e.g., Montanide); anethylene-vinyl acetate copolymer (e.g., Elvax 40W1,2); or lipid A, forexample. Such compositions are useful for generating an immune responseagainst a breast cancer directed molecule (e.g., an HLA-bindingsubsequence within a polypeptide encoded by a nucleotide sequence in SEQID NO: 1). In such methods, a peptide having an amino acid subsequenceof a polypeptide encoded by a nucleotide sequence in SEQ ID NO: 1-5 isdelivered to a subject, where the subsequence binds to an HLA moleculeand induces a CTL lymphocyte response. The peptide sometimes isdelivered to the subject as an isolated peptide or as a minigene in aplasmid that encodes the peptide. Methods for identifying HLA-bindingsubsequences in such polypeptides are known (see e.g., publicationWO02/20616 and PCT application U.S. Ser. No. 98/01,373 for methods ofidentifying such sequences).

The breast cancer cell may be in a group of breast cancer cells and/orother types of cells cultured in vitro or in a tissue having breastcancer cells (e.g., a melanocytic lesion) maintained in vitro or presentin an animal in vivo (e.g., a rat, mouse, ape or human). In certainembodiments, a composition comprises a component from a breast cancercell or from a subject having a breast cancer cell instead of the breastcancer cell or in addition to the breast cancer cell, where thecomponent sometimes is a nucleic acid molecule (e.g., genomic DNA), aprotein mixture or isolated protein, for example. The aforementionedcompositions have utility in diagnostic, prognostic and pharmacogenomicmethods described previously and in breast cancer therapeutics describedhereafter. Certain breast cancer molecules are described in greaterdetail below.

Compounds

Compounds can be obtained using any of the numerous approaches incombinatorial library methods known in the art, including: biologicallibraries; peptoid libraries (libraries of molecules having thefunctionalities of peptides, but with a novel, non-peptide backbonewhich are resistant to enzymatic degradation but which neverthelessremain bioactive (see, e.g., Zuckermann et a., J. Med. Chem. 37: 2678-85(1994)); spatially addressable parallel solid phase or solution phaselibraries; synthetic library methods requiring deconvolution; “one-beadone-compound” library methods; and synthetic library methods usingaffinity chromatography selection. Biological library and peptoidlibrary approaches are typically limited to peptide libraries, while theother approaches are applicable to peptide, non-peptide oligomer orsmall molecule libraries of compounds (Lam, Anticancer Drug Des. 12:145, (1997)). Examples of methods for synthesizing molecular librariesare described, for example, in DeWitt et al., Proc. Natl. Acad. Sci.U.S.A. 90: 6909 (1993); Erb et al., Proc. Natl. Acad. Sci. USA 91: 11422(1994); Zuckermann et al., J. Med. Chem. 37: 2678 (1994); Cho et al.,Science 261: 1303 (1993); Carrell et al., Angew. Chem. Int. Ed. Engl.33: 2059 (1994); Carell et al., Angew. Chem. Int. Ed. Engl. 33: 2061(1994); and in Gallop et al., J. Med. Chem. 37: 1233 (1994).

Libraries of compounds may be presented in solution (e.g., Houghten,Biotechniques 13: 412-421 (1992)), or on beads (Lam, Nature 354: 82-84(1991)), chips (Fodor, Nature 364: 555-556 (1993)), bacteria or spores(Ladner, U.S. Pat. No. 5,223,409), plasmids (Cull et al., Proc. Natl.Acad. Sci. USA 89: 1865-1869 (1992)) or on phage (Scott and Smith,Science 249: 386-390 (1990); Devlin, Science 249: 404-406(1990); Cwirlaet al., Proc. Natl. Acad. Sci. 87: 6378-6382 (1990); Felici, J. Mol.Biol. 222: 301-310 (1991); Ladner supra.).

A compound sometimes alters expression and sometimes alters activity ofa ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide and may be a smallmolecule. Small molecules include, but are not limited to, peptides,peptidomimetics (e.g., peptoids), amino acids, amino acid analogs,polynucleotides, polynucleotide analogs, nucleotides, nucleotideanalogs, organic or inorganic compounds (i.e., including heteroorganicand organometallic compounds) having a molecular weight less than about10,000 grams per mole, organic or inorganic compounds having a molecularweight less than about 5,000 grams per mole, organic or inorganiccompounds having a molecular weight less than about 1,000 grams permole, organic or inorganic compounds having a molecular weight less thanabout 500 grams per mole, and salts, esters, and other pharmaceuticallyacceptable forms of such compounds.

Antisense Nucleic Acid Molecules Ribozymes RNAi siRNA and ModifiedNucleic Acid Molecules

An “antisense” nucleic acid refers to a nucleotide sequencecomplementary to a “sense” nucleic acid encoding a polypeptide, e.g.,complementary to the coding strand of a double-stranded cDNA molecule orcomplementary to an mRNA sequence. The antisense nucleic acid can becomplementary to an entire coding strand in SEQ ID NO: 1-12, or to aportion thereof or a substantially identical sequence thereof. Inanother embodiment, the antisense nucleic acid molecule is antisense toa “noncoding region” of the coding strand of a nucleotide sequence inSEQ ID NO: 1-12 (e.g., 5′ and 3′ untranslated regions).

An antisense nucleic acid can be designed such that it is complementaryto the entire coding region of an mRNA encoded by a nucleotide sequencein SEQ ID NO: 1-4 (e.g., SEQ ID NO: 6-12), and often the antisensenucleic acid is an oligonucleotide antisense to only a portion of acoding or noncoding region of the mRNA. For example, the antisenseoligonucleotide can be complementary to the region surrounding thetranslation start site of the mRNA, e.g., between the −10 and +10regions of the target gene nucleotide sequence of interest. An antisenseoligonucleotide can be, for example, about 7, 10, 15, 20, 25, 30, 35,40, 45, 50, 55, 60, 65, 70, 75, 80, or more nucleotides in length. Theantisense nucleic acids, which include the ribozymes describedhereafter, can be designed to target a nucleotide sequence in SEQ ID NO:1-12, often a variant associated with breast cancer, or a substantiallyidentical sequence thereof. Among the variants, minor alleles and majoralleles can be targeted, and those associated with a higher risk ofbreast cancer are often designed, tested, and administered to subjects.

An antisense nucleic acid can be constructed using chemical synthesisand enzymatic ligation reactions using standard procedures. For example,an antisense nucleic acid (e.g., an antisense oligonucleotide) can bechemically synthesized using naturally occurring nucleotides orvariously modified nucleotides designed to increase the biologicalstability of the molecules or to increase the physical stability of theduplex formed between the antisense and sense nucleic acids, e.g.,phosphorothioate derivatives and acridine substituted nucleotides can beused. Antisense nucleic acid also can be produced biologically using anexpression vector into which a nucleic acid has been subcloned in anantisense orientation (i.e., RNA transcribed from the inserted nucleicacid will be of an antisense orientation to a target nucleic acid ofinterest, described further in the following subsection).

When utilized as therapeutics, antisense nucleic acids typically areadministered to a subject (e.g., by direct injection at a tissue site)or generated in situ such that they hybridize with or bind to cellularmRNA and/or genomic DNA encoding a polypeptide and thereby inhibitexpression of the polypeptide, for example, by inhibiting transcriptionand/or translation. Alternatively, antisense nucleic acid molecules canbe modified to target selected cells and then are administeredsystemically. For systemic administration, antisense molecules can bemodified such that they specifically bind to receptors or antigensexpressed on a selected cell surface, for example, by linking antisensenucleic acid molecules to peptides or antibodies which bind to cellsurface receptors or antigens. Antisense nucleic acid molecules can alsobe delivered to cells using the vectors described herein. Sufficientintracellular concentrations of antisense molecules are achieved byincorporating a strong promoter, such as a pol II or pol III promoter,in the vector construct.

Antisense nucleic acid molecules sometimes are *-anomeric nucleic acidmolecules. An *-anomeric nucleic acid molecule forms specificdouble-stranded hybrids with complementary RNA in which, contrary to theusual *-units, the strands run parallel to each other (Gaultier et a.,Nucleic Acids. Res. 15: 6625-6641 (1987)). Antisense nucleic acidmolecules can also comprise a 2′-o-methylribonucleotide (Inoue et a.,Nucleic Acids Res. 15: 6131-6148 (1987)) or a chimeric RNA-DNA analogue(Inoue et a., FEBS Lett. 215: 327-330 (1987)). Antisense nucleic acidssometimes are composed of DNA or PNA or any other nucleic acidderivatives described previously.

In another embodiment, an antisense nucleic acid is a ribozyme. Aribozyme having specificity for a ICAM, MAPK10, KIAA0861, NUMA1 or GALEnucleotide sequence can include one or more sequences complementary tosuch a nucleotide sequence, and a sequence having a known catalyticregion responsible for mRNA cleavage (see e.g., U.S. Pat. No. 5,093,246or Haselhoff and Gerlach, Nature 334: 585-591 (1988)). For example, aderivative of a Tetrahymena L-19 IVS RNA is sometimes utilized in whichthe nucleotide sequence of the active site is complementary to thenucleotide sequence to be cleaved in a mRNA (see e.g., Cech et al. U.S.Pat. No. 4,987,071; and Cech et al. U.S. Pat. No. 5,116,742). Also,target mRNA sequences can be used to select a catalytic RNA having aspecific ribonuclease activity from a pool of RNA molecules (see e.g.,Bartel & Szostak, Science 261: 1411-1418 (1993)).

Breast cancer directed molecules include in certain embodiments nucleicacids that can form triple helix structures with a ICAM, MAPK10,KIAA0861, NUMA1 or GALE nucleotide sequence or a substantially identicalsequence thereof, especially one that includes a regulatory region thatcontrols expression of a polypeptide. Gene expression can be inhibitedby targeting nucleotide sequences complementary to the regulatory regionof a ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequence or asubstantially identical sequence (e.g., promoter and/or enhancers) toform triple helical structures that prevent transcription of a gene intarget cells (see e.g., Helene, Anticancer Drug Des. 6(6): 569-84(1991); Helene et al., Ann. N.Y. Acad. Sci. 660: 27-36 (1992); andMaher, Bioassays 14(12): 807-15 (1992). Potential sequences that can betargeted for triple helix formation can be increased by creating aso-called “switchback” nucleic acid molecule. Switchback molecules aresynthesized in an alternating 5′-3′, 3′-5′ manner, such that they basepair with first one strand of a duplex and then the other, eliminatingthe necessity for a sizeable stretch of either purines or pyrimidines tobe present on one strand of a duplex.

Breast cancer directed molecules include RNAi and siRNA nucleic acids.Gene expression may be inhibited by the introduction of double-strandedRNA (dsRNA), which induces potent and specific gene silencing, aphenomenon called RNA interference or RNAi. See, e.g., Fire et al., U.S.Pat. No. 6,506,559; Tuschl et al. PCT International Publication No. WO01/75164; Kay et al. PCT International Publication No. WO 03/010180A1;or Bosher J M, Labouesse, Nat Cell Biol February 2000;2(2):E31-6. Thisprocess has been improved by decreasing the size of the double-strandedRNA to 20-24 base pairs (to create small-interfering RNAs or siRNAs)that “switched off” genes in mammalian cells without initiating an acutephase response, i.e., a host defense mechanism that often results incell death (see, e.g., Caplen et al. Proc Natl Acad Sci USA. Aug. 14,2001;98(17):9742-7 and Elbashir et al. Methods February2002;26(2):199-213). There is increasing evidence ofpost-transcriptional gene silencing by RNA interference (RNAi) forinhibiting targeted expression in mammalian cells at the mRNA level, inhuman cells. There is additional evidence of effective methods forinhibiting the proliferation and migration of tumor cells in humanpatients, and for inhibiting metastatic cancer development (see, e.g.,U.S. Patent Application No. US2001000993183; Caplen et al. Proc NatlAcad Sci USA; and Abderrahmani et al. Mol Cell Biol Nov. 21, 2001(21):7256-67).

An “siRNA” or “RNAi” refers to a nucleic acid that forms a doublestranded RNA and has the ability to reduce or inhibit expression of agene or target gene when the siRNA is delivered to or expressed in thesame cell as the gene or target gene. “siRNA” refers to shortdouble-stranded RNA formed by the complementary strands. Complementaryportions of the siRNA that hybridize to form the double strandedmolecule often have substantial or complete identity to the targetmolecule sequence. In one embodiment, an siRNA refers to a nucleic acidthat has substantial or complete identity to a target gene and forms adouble stranded siRNA.

When designing the siRNA molecules, the targeted region often isselected from a given DNA sequence beginning 50 to 100 nucleotidesdownstream of the start codon. See, e.g., Elbashir et al. Methods26:199-213 (2002). Initially, 5′ or 3′ UTRs and regions nearby the startcodon were avoided assuming that UTR-binding proteins and/or translationinitiation complexes may interfere with binding of the siRNP or RISCendonuclease complex. Sometimes regions of the target 23 nucleotides inlength conforming to the sequence motif AA(N19)TT (N, an nucleotide),and regions with approximately 30% to 70% G/C-content (often about 50%G/C-content) often are selected. If no suitable sequences are found, thesearch often is extended using the motif NA(N21). The sequence of thesense siRNA sometimes corresponds to (N19) TT or N21 (position 3 to 23of the 23-nt motif), respectively. In the latter case, the 3′ end of thesense siRNA often is converted to TT. The rationale for this sequenceconversion is to generate a symmetric duplex with respect to thesequence composition of the sense and antisense 3′ overhangs. Theantisense siRNA is synthesized as the complement to position 1 to 21 ofthe 23-nt motif. Because position 1 of the 23-nt motif is not recognizedsequence-specifically by the antisense siRNA, the 3′-most nucleotideresidue of the antisense siRNA can be chosen deliberately. However, thepenultimate nucleotide of the antisense siRNA (complementary to position2 of the 23-nt motif) often is complementary to the targeted sequence.For simplifying chemical synthesis, TT often is utilized. siRNAscorresponding to the target motif NAR(N17)YNN, where R is purine (A,G)and Y is pyrimidine (C,U), often are selected. Respective 21 nucleotidesense and antisense siRNAs often begin with a purine nucleotide and canalso be expressed from pol III expression vectors without a change intargeting site. Expression of RNAs from pol III promoters often isefficient when the first transcribed nucleotide is a purine.

The sequence of the siRNA can correspond to the full length target gene,or a subsequence thereof. Often, the siRNA is about 15 to about 50nucleotides in length (e.g., each complementary sequence of the doublestranded siRNA is 15-50 nucleotides in length, and the double strandedsiRNA is about 15-50 base pairs in length, sometimes about 20-30nucleotides in length or about 20-25 nucleotides in length, e.g., 20,21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length. ThesiRNA sometimes is about 21 nucleotides in length. Methods of usingsiRNA are well known in the art, and specific siRNA molecules may bepurchased from a number of companies including Dharmacon Research, Inc.

Antisense, ribozyme, RNAi and siRNA nucleic acids can be altered to formmodified nucleic acid molecules. The nucleic acids can be altered atbase moieties, sugar moieties or phosphate backbone moieties to improvestability, hybridization, or solubility of the molecule. For example,the deoxyribose phosphate backbone of nucleic acid molecules can bemodified to generate peptide nucleic acids (see Hyrup et al., Bioorganic& Medicinal Chemistry 4 (1): 5-23 (1996)). As used herein, the terms“peptide nucleic acid” or “PNA” refers to a nucleic acid mimic such as aDNA mimic, in which the deoxyribose phosphate backbone is replaced by apseudopeptide backbone and only the four natural nucleobases areretained. The neutral backbone of a PNA can allow for specifichybridization to DNA and RNA under conditions of low ionic strength.Synthesis of PNA oligomers can be performed using standard solid phasepeptide synthesis protocols as described, for example, in Hyrup et al.,(1996) supra and Perry-O'Keefe et al., Proc. Natl. Acad. Sci. 93:14670-675 (1996).

PNA nucleic acids can be used in prognostic, diagnostic, and therapeuticapplications. For example, PNAs can be used as antisense or antigeneagents for sequence-specific modulation of gene expression by, forexample, inducing transcription or translation arrest or inhibitingreplication. PNA nucleic acid molecules can also be used in the analysisof single base pair mutations in a gene, (e.g., by PNA-directed PCRclamping); as “artificial restriction enzymes” when used in combinationwith other enzymes, (e.g., S1 nucleases (Hyrup (1996) supra)); or asprobes or primers for DNA sequencing or hybridization (Hyrup et al.,(1996) supra; Perry-O'Keefe supra).

In other embodiments, oligonucleotides may include other appended groupssuch as peptides (e.g., for targeting host cell receptors in vivo), oragents facilitating transport across cell membranes (see e.g., Letsingeret al., Proc. Natl. Acad. Sci. USA 86: 6553-6556 (1989); Lemaitre etal., Proc. Natl. Acad. Sci. USA 84: 648-652 (1987); PCT Publication No.W088/09810) or the blood-brain barrier (see, e.g., PCT Publication No.W089/10134). In addition, oligonucleotides can be modified withhybridization-triggered cleavage agents (See, e.g., Krol et al.,Bio-Techniques 6: 958-976 (1988)) or intercalating agents. (See, e.g.,Zon, Pharm. Res. 5: 539-549 (1988) ). To this end, the oligonucleotidemay be conjugated to another molecule, (e.g., a peptide, hybridizationtriggered cross-linking agent, transport agent, orhybridization-triggered cleavage agent).

Also included herein are molecular beacon oligonucleotide primer andprobe molecules having one or more regions complementary to a nucleotidesequence of SEQ ID NO: 1-12 or a substantially identical sequencethereof, two complementary regions one having a fluorophore and one aquencher such that the molecular beacon is useful for quantifying thepresence of the nucleic acid in a sample. Molecular beacon nucleic acidsare described, for example, in Lizardi et al., U.S. Pat. No. 5,854,033;Nazarenko et al., U.S. Pat. No. 5,866,336, and Livak et al., U.S. Pat.No. 5,876,930.

Antibodies

The term “antibody” as used herein refers to an immunoglobulin moleculeor immunologically active portion thereof, i.e., an antigen-bindingportion. Examples of immunologically active portions of immunoglobulinmolecules include F(ab) and F(ab′)₂ fragments which can be generated bytreating the antibody with an enzyme such as pepsin. An antibodysometimes is a polyclonal, monoclonal, recombinant (e.g., a chimeric orhumanized), fully human, non-human (e.g., murine), or a single chainantibody. An antibody may have effector function and can fix complement,and is sometimes coupled to a toxin or imaging agent.

A full-length polypeptide or antigenic peptide fragment encoded by aICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleotide sequence can be used asan immunogen or can be used to identify antibodies made with otherimmunogens, e.g., cells, membrane preparations, and the like. Anantigenic peptide often includes at least 8 amino acid residues of theamino acid sequences encoded by a nucleotide sequence of SEQ ID NO:1-12, or substantially identical sequence thereof, and encompasses anepitope. Antigenic peptides sometimes include 10 or more amino acids, 15or more amino acids, 20 or more amino acids, or 30 or more amino acids.Hydrophilic and hydrophobic fragments of polypeptides sometimes are usedas immunogens.

Epitopes encompassed by the antigenic peptide are regions located on thesurface of the polypeptide (e.g., hydrophilic regions) as well asregions with high antigenicity. For example, an Emini surfaceprobability analysis of the human polypeptide sequence can be used toindicate the regions that have a particularly high probability of beinglocalized to the surface of the polypeptide and are thus likely toconstitute surface residues useful for targeting antibody production.The antibody may bind an epitope on any domain or region on polypeptidesdescribed herein.

Also, chimeric, humanized, and completely human antibodies are usefulfor applications which include repeated administration to subjects.Chimeric and humanized monoclonal antibodies, comprising both human andnon-human portions, can be made using standard recombinant DNAtechniques. Such chimeric and humanized monoclonal antibodies can beproduced by recombinant DNA techniques known in the art, for exampleusing methods described in Robinson et al International Application No.PCT/US86/02269; Akira, et al European Patent Application 184,187;Taniguchi, M., European Patent Application 171,496; Morrison et alEuropean Patent Application 173,494; Neuberger et al PCT InternationalPublication No. WO 86/01533; Cabilly et al U.S. Pat. No. 4,816,567;Cabilly et al European Patent Application 125,023; Better et al.,Science 240: 1041-1043 (1988); Liu et al., Proc. Natl. Acad. Sci. USA84: 3439-3443 (1987); Liu et al., J. Immunol. 139: 3521-3526 (1987); Sunet al., Proc. Natl. Acad. Sci. USA 84: 214-218 (1987); Nishimura et al.,Canc. Res. 47: 999-1005 (1987); Wood et al., Nature 314: 446-449 (1985);and Shaw et al., J. Natl. Cancer Inst. 80: 1553-1559 (1988); Morrison,S. L., Science 229: 1202-1207 (1985); Oi et al., BioTechniques 4: 214(1986); Winter U.S. Pat. No. 5,225,539; Jones et al., Nature 321:552-525 (1986); Verhoeyan et al., Science 239:1534; and Beidler et al.,J. Immunol. 141: 4053-4060 (1988).

Completely human antibodies are particularly desirable for therapeutictreatment of human patients. Such antibodies can be produced usingtransgenic mice that are incapable of expressing endogenousimmunoglobulin heavy and light chains genes, but which can express humanheavy and light chain genes. See, for example, Lonberg and Huszar, Int.Rev. Immunol. 13: 65-93 (1995); and U.S. Pat. Nos. 5,625,126; 5,633,425;5,569,825; 5,661,016; and 5,545,806. In addition, companies such asAbgenix, Inc. (Fremont, Calif.) and Medarex, Inc. (Princeton, N.J.), canbe engaged to provide human antibodies directed against a selectedantigen using technology similar to that described above. Completelyhuman antibodies that recognize a selected epitope also can be generatedusing a technique referred to as “guided selection.” In this approach aselected non-human monoclonal antibody (e.g., a murine antibody) is usedto guide the selection of a completely human antibody recognizing thesame epitope. This technology is described for example by Jespers etal., Bio/Technology 12: 899-903 (1994).

Antibody can be a single chain antibody. A single chain antibody (scFV)can be engineered (see, e.g., Colcher et al., Ann. N Y Acad. Sci. 880:263-80 (1999); and Reiter, Clin. Cancer Res. 2: 245-52 (1996)). Singlechain antibodies can be dimerized or multimerized to generatemultivalent antibodies having specificities for different epitopes ofthe same target polypeptide.

Antibodies also may be selected or modified so that they exhibit reducedor no ability to bind an Fc receptor. For example, an antibody may be anisotype or subtype, fragment or other mutant, which does not supportbinding to an Fc receptor (e.g., it has a mutagenized or deleted Fcreceptor binding region).

Also, an antibody (or fragment thereof) may be conjugated to atherapeutic moiety such as a cytotoxin, a therapeutic agent or aradioactive metal ion. A cytotoxin or cytotoxic agent includes any agentthat is detrimental to cells. Examples include taxol, cytochalasin B,gramicidin D, ethidium bromide, emetine, mitomycin, etoposide,tenoposide, vincristine, vinblastine, colchicin, doxorubicin,daunorubicin, dihydroxy anthracin dione, mitoxantrone, mithramycin,actinomycin D, 1 dehydrotestosterone, glucocorticoids, procaine,tetracaine, lidocaine, propranolol, and puromycin and analogs orhomologs thereof. Therapeutic agents include, but are not limited to,antimetabolites (e.g., methotrexate, 6-mercaptopurine, 6-thioguanine,cytarabine, 5-fluorouracil decarbazine), alkylating agents (e.g.,mechlorethamine, thiotepa chlorambucil, melphalan, carmustine (BCNU) andlomustine (CCNU), cyclophosphamide, busulfan, dibromomannitol,streptozotocin, mitomycin C, and cis-dichlorodiamine platinum (II) (DDP)cisplatin), anthracyclines (e.g., daunorubicin (formerly daunomycin) anddoxorubicin), antibiotics (e.g., dactinomycin (formerly actinomycin),bleomycin, mithramycin, and anthramycin (AMC)), and anti-mitotic agents(e.g., vincristine and vinblastine).

Antibody conjugates can be used for modifying a given biologicalresponse. For example, the drug moiety may be a protein or polypeptidepossessing a desired biological activity. Such proteins may include, forexample, a toxin such as abrin, ricin A, pseudomonas exotoxin, ordiphtheria toxin; a polypeptide such as tumor necrosis factor,γ-interferon, α-interferon, nerve growth factor, platelet derived growthfactor, tissue plasminogen activator; or, biological response modifierssuch as, for example, lymphokines, interleukin-1 (“IL-1”), interleukin-2(“IL-2”), interleukin-6 (“IL-6”), granulocyte macrophage colonystimulating factor (“GM-CSF”), granulocyte colony stimulating factor(“G-CSF”), or other growth factors. Also, an antibody can be conjugatedto a second antibody to form an antibody heteroconjugate as described bySegal in U.S. Pat. No. 4,676,980, for example.

An antibody (e.g., monoclonal antibody) can be used to isolate targetpolypeptides by standard techniques, such as affinity chromatography orimmunoprecipitation. Moreover, an antibody can be used to detect atarget polypeptide (e.g., in a cellular lysate or cell supernatant) inorder to evaluate the abundance and pattern of expression of thepolypeptide. Antibodies can be used diagnostically to monitorpolypeptide levels in tissue as part of a clinical testing procedure,e.g., to determine the efficacy of a given treatment regimen. Detectioncan be facilitated by coupling (i.e., physically linking) the antibodyto a detectable substance (i.e., antibody labeling). Examples ofdetectable substances include various enzymes, prosthetic groups,fluorescent materials, luminescent materials, bioluminescent materials,and radioactive materials. Examples of suitable enzymes includehorseradish peroxidase, alkaline phosphatase, β-galactosidase, oracetylcholinesterase; examples of suitable prosthetic group complexesinclude streptavidin/biotin and avidin/biotin; examples of suitablefluorescent materials include umbelliferone, fluorescein, fluoresceinisothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansylchloride or phycoerythrin; an example of a luminescent material includesluminol; examples of bioluminescent materials include luciferase,luciferin, and aequorin, and examples of suitable radioactive materialinclude ¹²⁵I, ¹³¹I, ³⁵S or ³H. Also, an antibody can be utilized as atest molecule for determining whether it can treat breast cancer, and asa therapeutic for administration to a subject for treating breastcancer.

An antibody can be made by immunizing with a purified antigen, or afragment thereof, e.g., a fragment described herein, a membraneassociated antigen, tissues, e.g., crude tissue preparations, wholecells, preferably living cells, lysed cells, or cell fractions.

Included herein are antibodies which bind only a native polypeptide,only denatured or otherwise non-native polypeptide, or which bind both,as well as those having linear or conformational epitopes.Conformational epitopes sometimes can be identified by selectingantibodies that bind to native but not denatured polypeptide. Alsofeatured are antibodies that specifically bind to a polypeptide variantassociated with breast cancer.

Screening Assays

Featured herein are methods for identifying a candidate therapeutic fortreating breast cancer. The methods comprise contacting a test moleculewith a target molecule in a system. A “target molecule” as used hereinrefers to a nucleic acid of SEQ ID NO: 1-12, a substantially identicalnucleic acid thereof, or a fragment thereof, and an encoded polypeptideof the foregoing. The method also comprises determining the presence orabsence of an interaction between the test molecule and the targetmolecule, where the presence of an interaction between the test moleculeand the nucleic acid or polypeptide identifies the test molecule as acandidate breast cancer therapeutic. The interaction between the testmolecule and the target molecule may be quantified.

Test molecules and candidate therapeutics include, but are not limitedto, compounds, antisense nucleic acids, siRNA molecules, ribozymes,polypeptides or proteins encoded by a ICAM, MAPK10, KIAA0861, NUMA1 orGALE nucleic acids, or a substantially identical sequence or fragmentthereof, and immunotherapeutics (e.g., antibodies and HLA-presentedpolypeptide fragments). A test molecule or candidate therapeutic may actas a modulator of target molecule concentration or target moleculefunction in a system. A “modulator” may agonize (i.e., up-regulates) orantagonize (i.e., down-regulates) a target molecule concentrationpartially or completely in a system by affecting such cellular functionsas DNA replication and/or DNA processing (e.g., DNA methylation or DNArepair), RNA transcription and/or RNA processing (e.g., removal ofintronic sequences and/or translocation of spliced mRNA from thenucleus), polypeptide production (e.g., translation of the polypeptidefrom mRNA), and/or polypeptide post-translational modification (e.g.,glycosylation, phosphorylation, and proteolysis of pro-polypeptides). Amodulator may also agonize or antagonize a biological function of atarget molecule partially or completely, where the function may includeadopting a certain structural conformation, interacting with one or morebinding partners, ligand binding, catalysis (e.g., phosphorylation,dephosphorylation, hydrolysis, methylation, and isomerization), and aneffect upon a cellular event (e.g., effecting progression of breastcancer).

As used herein, the term “system” refers to a cell free in vitroenvironment and a cell-based environment such as a collection of cells,a tissue, an organ, or an organism. A system is “contacted” with a testmolecule in a variety of manners, including adding molecules in solutionand allowing them to interact with one another by diffusion, cellinjection, and any administration routes in an animal. As used herein,the term “interaction” refers to an effect of a test molecule on testmolecule, where the effect sometimes is binding between the testmolecule and the target molecule, and sometimes is an observable changein cells, tissue, or organism.

There are many standard methods for detecting the presence or absence ofan interaction between a test molecule and a target molecule. Forexample, titrametric, acidimetric, radiometric, NMR, monolayer,polarographic, spectrophotometric, fluorescent, and ESR assays probativeof a target molecule interaction may be utilized.

In general, an interaction can be determined by labeling the testmolecule and/or the ICAM, MAPK10, KIAA0861, NUMA1 or GALE molecule,where the label is covalently or non-covalently attached to the testmolecule or ICAM, MAPK10, KIAA0861, NUMA1 or GALE molecule. The label issometimes a radioactive molecule such as ¹²⁵I, ¹³¹I, ³⁵S or ³H, whichcan be detected by direct counting of radioemission or by scintillationcounting. Also, enzymatic labels such as horseradish peroxidase,alkaline phosphatase, or luciferase may be utilized where the enzymaticlabel can be detected by determining conversion of an appropriatesubstrate to product. Also, presence or absence of an interaction can bedetermined without labeling. For example, a microphysiometer (e.g.,Cytosensor) is an analytical instrument that measures the rate at whicha cell acidifies its environment using a light-addressablepotentiometric sensor (LAPS). Changes in this acidification rate can beused as an indication of an interaction between a test molecule andICAM, MAPK10, KIAA0861, NUMA1 or GALE (McConnell, H. M. et al., Science257: 1906-1912 (1992)).

In cell-based systems, cells typically include a ICAM, MAPK10, KIAA0861,NUMA1 or GALE nucleic acid or polypeptide or variants thereof and areoften of mammalian origin, although the cell can be of any origin. Wholecells, cell homogenates, and cell fractions (e.g., cell membranefractions) can be subjected to analysis. Where interactions between atest molecule with a ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptideor variant thereof are monitored, soluble and/or membrane bound forms ofthe polypeptide or variant may be utilized. Where membrane-bound formsof the polypeptide are used, it may be desirable to utilize asolubilizing agent. Examples of such solubilizing agents includenon-ionic detergents such as n-octylglucoside, n-dodecylglucoside,n-dodecylmaltoside, octanoyl-N-methylglucamide,decanoyl-N-methylglucamide, Triton® X-100, Triton® X-114, Thesit®,Isotridecypoly(ethylene glycol ether)n,3-[(3-cholamidopropyl)dimethylamminio]-1-propane sulfonate (CHAPS),3-[(3-cholamidopropyl)dimethylamminio]-2-hydroxy-1-propane sulfonate(CHAPSO), or N-dodecyl-N,N-dimethyl-3-ammonio-1-propane sulfonate.

An interaction between two molecules also can be detected by monitoringfluorescence energy transfer (FET) (see, for example, Lakowicz et al.,U.S. Pat. No. 5,631,169; Stavrianopoulos et al. U.S. Pat. No.4,868,103). A fluorophore label on a first, “donor” molecule is selectedsuch that its emitted fluorescent energy will be absorbed by afluorescent label on a second, “acceptor” molecule, which in turn isable to fluoresce due to the absorbed energy. Alternately, the “donor”polypeptide molecule may simply utilize the natural fluorescent energyof tryptophan residues. Labels are chosen that emit differentwavelengths of light, such that the “acceptor” molecule label may bedifferentiated from that of the “donor”. Since the efficiency of energytransfer between the labels is related to the distance separating themolecules, the spatial relationship between the molecules can beassessed. In a situation in which binding occurs between the molecules,the fluorescent emission of the “acceptor” molecule label in the assayshould be maximal. An FET binding event can be conveniently measuredthrough standard fluorometric detection means well known in the art(e.g., using a fluorimeter).

In another embodiment, determining the presence or absence of aninteraction between a test molecule and a ICAM, MAPK10, KIAA0861, NUMA1or GALE molecule can be effected by using real-time BiomolecularInteraction Analysis (BIA) (see, e.g., Sjolander & Urbaniczk, Anal.Chem. 63: 2338-2345 (1991) and Szabo et al., Curr. Opin. Struct. Biol.5: 699-705 (1995)). “Surface plasmon resonance” or “BIA” detectsbiospecific interactions in real time, without labeling any of theinteractants (e.g., BIAcore). Changes in the mass at the binding surface(indicative of a binding event) result in alterations of the refractiveindex of light near the surface (the optical phenomenon of surfaceplasmon resonance (SPR)), resulting in a detectable signal which can beused as an indication of real-time reactions between biologicalmolecules.

In another embodiment, the ICAM, MAPK10, KIAA0861, NUMA1 or GALEmolecule or test molecules are anchored to a solid phase. The ICAM,MAPK10, KIAA0861, NUMA1 or GALE molecule/test molecule complexesanchored to the solid phase can be detected at the end of the reaction.The target ICAM, MAPK10, KIAA0861, NUMA1 or GALE molecule is oftenanchored to a solid surface, and the test molecule, which is notanchored, can be labeled, either directly or indirectly, with detectablelabels discussed herein.

It may be desirable to immobilize a ICAM, MAPK10, KIAA0861, NUMA1 orGALE molecule, an anti-ICAM, MAPK10, KIAA0861, NUMA1 or GALE antibody,or test molecules to facilitate separation of complexed from uncomplexedforms of ICAM, MAPK10, KIAA0861, NUMA1 or GALE molecules and testmolecules, as well as to accommodate automation of the assay. Binding ofa test molecule to a ICAM, MAPK10, KIAA0861, NUMA1 or GALE molecule canbe accomplished in any vessel suitable for containing the reactants.Examples of such vessels include microtiter plates, test tubes, andmicro-centrifuge tubes. In one embodiment, a fusion polypeptide can beprovided which adds a domain that allows a ICAM, MAPK10, KIAA0861, NUMA1or GALE molecule to be bound to a matrix. For example,glutathione-S-transferase/ICAM, MAPK10, KIAA0861, NUMA1 or GALE fusionpolypeptides or glutathione-S-transferase/target fusion polypeptides canbe adsorbed onto glutathione sepharose beads (Sigma Chemical, St. Louis,Mo.) or glutathione derivitized microtiter plates, which are thencombined with the test compound or the test compound and either thenon-adsorbed target polypeptide or ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide, and the mixture incubated under conditions conducive tocomplex formation (e.g., at physiological conditions for salt and pH).Following incubation, the beads or microtiter plate wells are washed toremove any unbound components, the matrix immobilized in the case ofbeads, complex determined either directly or indirectly, for example, asdescribed above. Alternatively, the complexes can be dissociated fromthe matrix, and the level of ICAM, MAPK10, KIAA0861, NUMA1 or GALEbinding or activity determined using standard techniques.

Other techniques for immobilizing a ICAM, MAPK10, KIAA0861, NUMA1 orGALE molecule on matrices include using biotin and streptavidin. Forexample, biotinylated ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptideor target molecules can be prepared from biotin-NHS(N-hydroxy-succinimide) using techniques known in the art (e.g.,biotinylation kit, Pierce Chemicals, Rockford, Ill.), and immobilized inthe wells of streptavidin-coated 96 well plates (Pierce Chemical).

In order to conduct the assay, the non-immobilized component is added tothe coated surface containing the anchored component. After the reactionis complete, unreacted components are removed (e.g., by washing) underconditions such that any complexes formed will remain immobilized on thesolid surface. The detection of complexes anchored on the solid surfacecan be accomplished in a number of ways. Where the previouslynon-immobilized component is pre-labeled, the detection of labelimmobilized on the surface indicates that complexes were formed. Wherethe previously non-immobilized component is not pre-labeled, an indirectlabel can be used to detect complexes anchored on the surface; e.g.,using a labeled antibody specific for the immobilized component (theantibody, in turn, can be directly labeled or indirectly labeled with,e.g., a labeled anti-Ig antibody).

In one embodiment, this assay is performed utilizing antibodies reactivewith ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide or test moleculesbut which do not interfere with binding of the ICAM, MAPK10, KIAA0861,NUMA1 or GALE polypeptide to its test molecule. Such antibodies can bederivitized to the wells of the plate, and unbound target or ICAM,MAPK10, KIAA0861, NUMA1 or GALE polypeptide trapped in the wells byantibody conjugation. Methods for detecting such complexes, in additionto those described above for the GST-immobilized complexes, includeimmunodetection of complexes using antibodies reactive with the ICAM,MAPK10, KIAA0861, NUMA1 or GALE polypeptide or target molecule, as wellas enzyme-linked assays which rely on detecting an enzymatic activityassociated with the ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide ortest molecule.

Alternatively, cell free assays can be conducted in a liquid phase. Insuch an assay, the reaction products are separated from unreactedcomponents, by any of a number of standard techniques, including but notlimited to: differential centrifugation (see, for example, Rivas, G.,and Minton, A. P., Trends Biochem Sci August; 18(8): 284-7 (1993));chromatography (gel filtration chromatography, ion-exchangechromatography); electrophoresis (see, e.g., Ausubel et al., eds.Current Protocols in Molecular Biology, J. Wiley: New York (1999)); andimmunoprecipitation (see, for example, Ausubel, F. et al., eds. CurrentProtocols in Molecular Biology, J. Wiley: New York (1999)). Such resinsand chromatographic techniques are known to one skilled in the art (see,e.g., Heegaard, J Mol. Recognit. Winter; 11(1-6): 141-8 (1998); Hage &Tweed, J. Chromatogr. B Biomed. Sci. Appl. October 10; 699 (1-2):499-525 (1997)). Further, fluorescence energy transfer may also beconveniently utilized, as described herein, to detect binding withoutfurther purification of the complex from solution.

In another embodiment, modulators of ICAM, MAPK10, KIAA0861, NUMA1 orGALE expression are identified. For example, a cell or cell free mixtureis contacted with a candidate compound and the expression of ICAM,MAPK10, KIAA0861, NUMA1 or GALE mRNA or polypeptide evaluated relativeto the level of expression of ICAM, MAPK10, KIAA0861, NUMA1 or GALE mRNAor polypeptide in the absence of the candidate compound. When expressionof ICAM, MAPK10, KIAA0861, NUMA1 or GALE mRNA or polypeptide is greaterin the presence of the candidate compound than in its absence, thecandidate compound is identified as a stimulator of ICAM, MAPK10,KIAA0861, NUMA1 or GALE mRNA or polypeptide expression. Alternatively,when expression of ICAM, MAPK10, KIAA0861, NUMA1 or GALE mRNA orpolypeptide is less (statistically significantly less) in the presenceof the candidate compound than in its absence, the candidate compound isidentified as an inhibitor of ICAM, MAPK10, KIAA0861, NUMA1 or GALE mRNAor polypeptide expression. The level of ICAM, MAPK10, KIAA0861, NUMA1 orGALE mRNA or polypeptide expression can be determined by methodsdescribed herein for detecting ICAM, MAPK10, KIAA0861, NUMA1 or GALEmRNA or polypeptide.

In another embodiment, binding partners that interact with a ICAM,MAPK10, KIAA0861, NUMA1 or GALE molecule are detected. The ICAM, MAPK10,KIAA0861, NUMA1 or GALE molecules can interact with one or more cellularor extracellular macromolecules, such as polypeptides, in vivo, andthese molecules that interact with ICAM, MAPK10, KIAA0861, NUMA1 or GALEmolecules are referred to herein as “binding partners.” Molecules thatdisrupt such interactions can be useful in regulating the activity ofthe target gene product. Such molecules can include, but are not limitedto molecules such as antibodies, peptides, and small molecules. Targetgenes/products for use in this embodiment often are the ICAM, MAPK10,KIAA0861, NUMA1 or GALE genes herein identified. In an alternativeembodiment, provided is a method for determining the ability of the testcompound to modulate the activity of a ICAM, MAPK10, KIAA0861, NUMA1 orGALE polypeptide through modulation of the activity of a downstreameffector of a ICAM, MAPK10, KIAA0861, NUMA1 or GALE target molecule. Forexample, the activity of the effector molecule on an appropriate targetcan be determined, or the binding of the effector to an appropriatetarget can be determined, as previously described.

To identify compounds that interfere with the interaction between thetarget gene product and its cellular or extracellular bindingpartner(s), e.g., a substrate, a reaction mixture containing the targetgene product and the binding partner is prepared, under conditions andfor a time sufficient, to allow the two products to form complex. Inorder to test an inhibitory agent, the reaction mixture is provided inthe presence and absence of the test compound. The test compound can beinitially included in the reaction mixture, or can be added at a timesubsequent to the addition of the target gene and its cellular orextracellular binding partner. Control reaction mixtures are incubatedwithout the test compound or with a placebo. The formation of anycomplexes between the target gene product and the cellular orextracellular binding partner is then detected. The formation of acomplex in the control reaction, but not in the reaction mixturecontaining the test compound, indicates that the compound interfereswith the interaction of the target gene product and the interactivebinding partner. Additionally, complex formation within reactionmixtures containing the test compound and normal target gene product canalso be compared to complex formation within reaction mixturescontaining the test compound and mutant target gene product. Thiscomparison can be important in those cases where it is desirable toidentify compounds that disrupt interactions of mutant but not normaltarget gene products.

These assays can be conducted in a heterogeneous or homogeneous format.Heterogeneous assays involve anchoring either the target gene product orthe binding partner onto a solid phase, and detecting complexes anchoredon the solid phase at the end of the reaction. In homogeneous assays,the entire reaction is carried out in a liquid phase. In eitherapproach, the order of addition of reactants can be varied to obtaindifferent information about the compounds being tested. For example,test compounds that interfere with the interaction between the targetgene products and the binding partners, e.g., by competition, can beidentified by conducting the reaction in the presence of the testsubstance. Alternatively, test compounds that disrupt preformedcomplexes, e.g., compounds with higher binding constants that displaceone of the components from the complex, can be tested by adding the testcompound to the reaction mixture after complexes have been formed. Thevarious formats are briefly described below.

In a heterogeneous assay system, either the target gene product or theinteractive cellular or extracellular binding partner, is anchored ontoa solid surface (e.g., a microtiter plate), while the non-anchoredspecies is labeled, either directly or indirectly. The anchored speciescan be immobilized by non-covalent or covalent attachments.Alternatively, an immobilized antibody specific for the species to beanchored can be used to anchor the species to the solid surface.

In order to conduct the assay, the partner of the immobilized species isexposed to the coated surface with or without the test compound. Afterthe reaction is complete, unreacted components are removed (e.g., bywashing) and any complexes formed will remain immobilized on the solidsurface. Where the non-immobilized species is pre-labeled, the detectionof label immobilized on the surface indicates that complexes wereformed. Where the non-immobilized species is not pre-labeled, anindirect label can be used to detect complexes anchored on the surface;e.g., using a labeled antibody specific for the initiallynon-immobilized species (the antibody, in turn, can be directly labeledor indirectly labeled with, e.g., a labeled anti-Ig antibody). Dependingupon the order of addition of reaction components, test compounds thatinhibit complex formation or that disrupt preformed complexes can bedetected.

Alternatively, the reaction can be conducted in a liquid phase in thepresence or absence of the test compound, the reaction productsseparated from unreacted components, and complexes detected; e.g., usingan immobilized antibody specific for one of the binding components toanchor any complexes formed in solution, and a labeled antibody specificfor the other partner to detect anchored complexes. Again, dependingupon the order of addition of reactants to the liquid phase, testcompounds that inhibit complex or that disrupt preformed complexes canbe identified.

In an alternate embodiment, a homogeneous assay can be used. Forexample, a preformed complex of the target gene product and theinteractive cellular or extracellular binding partner product isprepared in that either the target gene products or their bindingpartners are labeled, but the signal generated by the label is quencheddue to complex formation (see, e.g., U.S. Pat. No. 4,109,496 thatutilizes this approach for immunoassays). The addition of a testsubstance that competes with and displaces one of the species from thepreformed complex will result in the generation of a signal abovebackground. In this way, test substances that disrupt target geneproduct-binding partner interaction can be identified.

Also, binding partners of ICAM, MAPK10, KIAA0861, NUMA1 or GALEmolecules can be identified in a two-hybrid assay or three-hybrid assay(see, e.g., U.S. Pat. No. 5,283,317; Zervos et al., Cell 72:223-232(1993); Madura et al., J. Biol. Chem. 268: 12046-12054 (1993); Bartel etal., Biotechniques 14: 920-924 (1993); Iwabuchi et al., Oncogene 8:1693-1696 (1993); and Brent WO94/10300), to identify other polypeptides,which bind to or interact with ICAM, MAPK10, KIAA0861, NUMA1 or GALE(“ICAM, MAPK10, KIAA0861, NUMA1 or GALE-binding polypeptides” or “ICAM,MAPK10, KIAA0861, NUMA1 or GALE-bp”) and are involved in ICAM, MAPK10,KIAA0861, NUMA1 or GALE activity. Such ICAM, MAPK10, KIAA0861, NUMA1 orGALE-bps can be activators or inhibitors of signals by the ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptides or ICAM, MAPK10, KIAA0861, NUMA1 orGALE targets as, for example, downstream elements of a ICAM, MAPK10,KIAA0861, NUMA1 or GALE-mediated signaling pathway.

A two-hybrid system is based on the modular nature of most transcriptionfactors, which consist of separable DNA-binding and activation domains.Briefly, the assay utilizes two different DNA constructs. In oneconstruct, the gene that codes for a ICAM, MAPK10, KIAA0861, NUMA1 orGALE polypeptide is fused to a gene encoding the DNA binding domain of aknown transcription factor (e.g., GAL-4). In the other construct, a DNAsequence, from a library of DNA sequences, that encodes an unidentifiedpolypeptide (“prey” or “sample”) is fused to a gene that codes for theactivation domain of the known transcription factor. (Alternatively the:ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide can be the fused tothe activator domain.) If the “bait” and the “prey” polypeptides areable to interact, in vivo, forming a ICAM, MAPK10, KIAA0861, NUMA1 orGALE-dependent complex, the DNA-binding and activation domains of thetranscription factor are brought into close proximity. This proximityallows transcription of a reporter gene (e.g., LacZ) which is operablylinked to a transcriptional regulatory site responsive to thetranscription factor. Expression of the reporter gene can be detectedand cell colonies containing the functional transcription factor can beisolated and used to obtain the cloned gene which encodes thepolypeptide which interacts with the ICAM, MAPK10, KIAA0861, NUMA1 orGALE polypeptide.

Candidate therapeutics for treating breast cancer are identified from agroup of test molecules that interact with a ICAM, MAPK10, KIAA0861,NUMA1 or GALE nucleic acid or polypeptide. Test molecules are normallyranked according to the degree with which they interact or modulate(e.g., agonize or antagonize) DNA replication and/or processing, RNAtranscription and/or processing, polypeptide production and/orprocessing, and/or function of ICAM, MAPK10, KIAA0861, NUMA1 or GALEmolecules, for example, and then top ranking modulators are selected. Ina preferred embodiment, the candidate therapeutic (i.e., test molecule)acts as a ICAM, MAPK10, KIAA0861, NUMA1 or GALE antagonist. Also,pharmacogenomic information described herein can determine the rank of amodulator. Candidate therapeutics typically are formulated foradministration to a subject.

Therapeutic Treatments

Formulations or pharmaceutical compositions typically include incombination with a pharmaceutically acceptable carrier, a compound, anantisense nucleic acid, a ribozyme, an antibody, a binding partner thatinteracts with a ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide, aICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acid, or a fragmentthereof. The formulated molecule may be one that is identified by ascreening method described above. As used herein, the term“pharmaceutically acceptable carrier” includes solvents, dispersionmedia, coatings, antibacterial and antifungal agents, isotonic andabsorption delaying agents, and the like, compatible with pharmaceuticaladministration. Supplementary active compounds can also be incorporatedinto the compositions.

A pharmaceutical composition is formulated to be compatible with itsintended route of administration. Examples of routes of administrationinclude parenteral, e.g., intravenous, intradermal, subcutaneous, oral(e.g., inhalation), transdermal (topical), transmucosal, and rectaladministration. Solutions or suspensions used for parenteral,intradermal, or subcutaneous application can include the followingcomponents: a sterile diluent such as water for injection, salinesolution, fixed oils, polyethylene glycols, glycerin, propylene glycolor other synthetic solvents; antibacterial agents such as benzyl alcoholor methyl parabens; antioxidants such as ascorbic acid or sodiumbisulfite; chelating agents such as ethylenediaminetetraacetic acid;buffers such as acetates, citrates or phosphates and agents for theadjustment of tonicity such as sodium chloride or dextrose. pH can beadjusted with acids or bases, such as hydrochloric acid or sodiumhydroxide. The parenteral preparation can be enclosed in ampoules,disposable syringes or multiple dose vials made of glass or plastic.

Oral compositions generally include an inert diluent or an ediblecarrier. For the purpose of oral therapeutic administration, the activecompound can be incorporated with excipients and used in the form oftablets, troches, or capsules, e.g., gelatin capsules. Oral compositionscan also be prepared using a fluid carrier for use as a mouthwash.Pharmaceutically compatible binding agents, and/or adjuvant materialscan be included as part of the composition. The tablets, pills,capsules, troches and the like can contain any of the followingingredients, or compounds of a similar nature: a binder such asmicrocrystalline cellulose, gum tragacanth or gelatin; an excipient suchas starch or lactose, a disintegrating agent such as alginic acid,Primogel, or corn starch; a lubricant such as magnesium stearate orSterotes; a glidant such as colloidal silicon dioxide; a sweeteningagent such as sucrose or saccharin; or a flavoring agent such aspeppermint, methyl salicylate, or orange flavoring.

Pharmaceutical compositions suitable for injectable use include sterileaqueous solutions (where water soluble) or dispersions and sterilepowders for the extemporaneous preparation of sterile injectablesolutions or dispersion. For intravenous administration, suitablecarriers include physiological saline, bacteriostatic water, CremophorEL™ (BASF, Parsippany, N.J.) or phosphate buffered saline (PBS). In allcases, the composition must be sterile and should be fluid to the extentthat easy syringability exists. It should be stable under the conditionsof manufacture and storage and must be preserved against thecontaminating action of microorganisms such as bacteria and fungi. Thecarrier can be a solvent or dispersion medium containing, for example,water, ethanol, polyol (for example, glycerol, propylene glycol, andliquid polyethylene glycol, and the like), and suitable mixturesthereof. The proper fluidity can be maintained, for example, by the useof a coating such as lecithin, by the maintenance of the requiredparticle size in the case of dispersion and by the use of surfactants.Prevention of the action of microorganisms can be achieved by variousantibacterial and antifungal agents, for example, parabens,chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. In manycases, isotonic agents, for example, sugars, polyalcohols such asmannitol, sorbitol, sodium chloride sometimes are included in thecomposition. Prolonged absorption of the injectable compositions can bebrought about by including in the composition an agent which delaysabsorption, for example, aluminum monostearate and gelatin.

Sterile injectable solutions can be prepared by incorporating the activecompound in the required amount in an appropriate solvent with one or acombination of ingredients enumerated above, as required, followed byfiltered sterilization. Generally, dispersions are prepared byincorporating the active compound into a sterile vehicle which containsa basic dispersion medium and the required other ingredients from thoseenumerated above. In the case of sterile powders for the preparation ofsterile injectable solutions, methods of preparation often utilized arevacuum drying and freeze-drying which yields a powder of the activeingredient plus any additional desired ingredient from a previouslysterile-filtered solution thereof.

For administration by inhalation, the compounds are delivered in theform of an aerosol spray from pressured container or dispenser whichcontains a suitable propellant, e.g., a gas such as carbon dioxide, or anebulizer.

Systemic administration can also be by transmucosal or transdermalmeans. For transmucosal or transdermal administration, penetrantsappropriate to the barrier to be permeated are used in the formulation.Such penetrants are generally known in the art, and include, forexample, for transmucosal administration, detergents, bile salts, andfusidic acid derivatives. Transmucosal administration can beaccomplished through the use of nasal sprays or suppositories. Fortransdermal administration, the active compounds are formulated intoointments, salves, gels, or creams as generally known in the art.Molecules can also be prepared in the form of suppositories (e.g., withconventional suppository bases such as cocoa butter and otherglycerides) or retention enemas for rectal delivery.

In one embodiment, active molecules are prepared with carriers that willprotect the compound against rapid elimination from the body, such as acontrolled release formulation, including implants and microencapsulateddelivery systems. Biodegradable, biocompatible polymers can be used,such as ethylene vinyl acetate, polyanhydrides, polyglycolic acid,collagen, polyorthoesters, and polylactic acid. Methods for preparationof such formulations will be apparent to those skilled in the art.Materials can also be obtained commercially from Alza Corporation andNova Pharmaceuticals, Inc. Liposomal suspensions (including liposomestargeted to infected cells with monoclonal antibodies to viral antigens)can also be used as pharmaceutically acceptable carriers. These can beprepared according to methods known to those skilled in the art, forexample, as described in U.S. Pat. No. 4,522,811.

It is advantageous to formulate oral or parenteral compositions indosage unit form for ease of administration and uniformity of dosage.Dosage unit form as used herein refers to physically discrete unitssuited as unitary dosages for the subject to be treated; each unitcontaining a predetermined quantity of active compound calculated toproduce the desired therapeutic effect in association with the requiredpharmaceutical carrier.

Toxicity and therapeutic efficacy of such compounds can be determined bystandard pharmaceutical procedures in cell cultures or experimentalanimals, e.g., for determining the LD₅₀ (the dose lethal to 50% of thepopulation) and the ED₅₀ (the dose therapeutically effective in 50% ofthe population). The dose ratio between toxic and therapeutic effects isthe therapeutic index and it can be expressed as the ratio LD₅₀/ED₅₀.Molecules which exhibit high therapeutic indices often are utilized.While molecules that exhibit toxic side effects may be used, care shouldbe taken to design a delivery system that targets such compounds to thesite of affected tissue in order to minimize potential damage touninfected cells and, thereby, reduce side effects.

The data obtained from the cell culture assays and animal studies can beused in formulating a range of dosage for use in humans. The dosage ofsuch molecules often lies within a range of circulating concentrationsthat include the ED₅₀ with little or no toxicity. The dosage may varywithin this range depending upon the dosage form employed and the routeof administration utilized. For any molecules used in the methodsdescribed herein, the therapeutically effective dose can be estimatedinitially from cell culture assays. A dose may be formulated in animalmodels to achieve a circulating plasma concentration range that includesthe IC₅₀ (i.e., the concentration of the test compound which achieves ahalf-maximal inhibition of symptoms) as determined in cell culture. Suchinformation can be used to more accurately determine useful doses inhumans. Levels in plasma may be measured, for example, by highperformance liquid chromatography.

As defined herein, a therapeutically effective amount of protein orpolypeptide (i.e., an effective dosage) ranges from about 0.001 to 30mg/kg body weight, sometimes about 0.01 to 25 mg/kg body weight, oftenabout 0.1 to 20 mg/kg body weight, and more often about 1 to 10 mg/kg, 2to 9 mg/kg, 3 to 8 mg/kg, 4 to 7 mg/kg, or 5 to 6 mg/kg body weight. Theprotein or polypeptide can be administered one time per week for betweenabout 1 to 10 weeks, sometimes between 2 to 8 weeks, often between about3 to 7 weeks, and more often for about 4, 5, or 6 weeks. The skilledartisan will appreciate that certain factors may influence the dosageand timing required to effectively treat a subject, including but notlimited to the severity of the disease or disorder, previous treatments,the general health and/or age of the subject, and other diseasespresent. Moreover, treatment of a subject with a therapeuticallyeffective amount of a protein, polypeptide, or antibody can include asingle treatment, or sometimes can include a series of treatments.

With regard to polypeptide formulations, featured herein is a method fortreating breast cancer in a subject, which comprises contacting one ormore cells in the subject with a first ICAM, MAPK10, KIAA0861, NUMA1 orGALE polypeptide, where the subject comprises a second ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptide having one or more polymorphicvariations associated with cancer, and where the first polypeptidecomprises fewer polymorphic variations associated with cancer than thesecond polypeptide. The first and second polypeptides are encoded by anucleic acid which comprises a nucleotide sequence selected from thegroup consisting of the nucleotide sequence of SEQ ID NO: 1-12; anucleotide sequence which encodes a polypeptide consisting of an aminoacid sequence encoded by a nucleotide sequence of SEQ ID NO: 1-12; anucleotide sequence which encodes a polypeptide that is 90% or moreidentical to an amino acid sequence encoded by a nucleotide sequence ofSEQ ID NO: 1-12 and a nucleotide sequence 90% or more identical to anucleotide sequence of SEQ ID NO: 1-12. The subject is often a human.

For antibodies, a dosage of 0.1 mg/kg of body weight (generally 10 mg/kgto 20 mg/kg) is often utilized. If the antibody is to act in the brain,a dosage of 50 mg/kg to 100 mg/kg is often appropriate. Generally,partially human antibodies and fully human antibodies have a longerhalf-life within the human body than other antibodies. Accordingly,lower dosages and less frequent administration is often possible.Modifications such as lipidation can be used to stabilize antibodies andto enhance uptake and tissue penetration (e.g., into the brain). Amethod for lipidation of antibodies is described by Cruikshank et al.,J. Acquired Immune Deficiency Syndromes and Human Retrovirology 14:193(1997).

Antibody conjugates can be used for modifying a given biologicalresponse, the drug moiety is not to be construed as limited to classicalchemical therapeutic agents. For example, the drug moiety may be aprotein or polypeptide possessing a desired biological activity. Suchproteins may include, for example, a toxin such as abrin, ricin A,pseudomonas exotoxin, or diphtheria toxin; a polypeptide such as tumornecrosis factor, .alpha.-interferon, .beta.-interferon, nerve growthfactor, platelet derived growth factor, tissue plasminogen activator;or, biological response modifiers such as, for example, lymphokines,interleukin-1 (“IL-1”), interleukin-2 (“IL-2”), interleukin-6 (“IL-6”),granulocyte macrophage colony stimulating factor (“GM-CSF”), granulocytecolony stimulating factor (“G-CSF”), or other growth factors.Alternatively, an antibody can be conjugated to a second antibody toform an antibody heteroconjugate as described by Segal in U.S. Pat. No.4,676,980.

For compounds, exemplary doses include milligram or microgram amounts ofthe compound per kilogram of subject or sample weight, for example,about 1 microgram per kilogram to about 500 milligrams per kilogram,about 100 micrograms per kilogram to about 5 milligrams per kilogram, orabout 1 microgram per kilogram to about 50 micrograms per kilogram. Itis understood that appropriate doses of a small molecule depend upon thepotency of the small molecule with respect to the expression or activityto be modulated. When one or more of these small molecules is to beadministered to an animal (e.g., a human) in order to modulateexpression or activity of a polypeptide or nucleic acid describedherein, a physician, veterinarian, or researcher may, for example,prescribe a relatively low dose at first, subsequently increasing thedose until an appropriate response is obtained. In addition, it isunderstood that the specific dose level for any particular animalsubject will depend upon a variety of factors including the activity ofthe specific compound employed, the age, body weight, general health,gender, and diet of the subject, the time of administration, the routeof administration, the rate of excretion, any drug combination, and thedegree of expression or activity to be modulated.

ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acid molecules can beinserted into vectors and used in gene therapy methods for treatingbreast cancer. Featured herein is a method for treating breast cancer ina subject, which comprises contacting one or more cells in the subjectwith a first ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acid, wheregenomic DNA in the subject comprises a second ICAM, MAPK10, KIAA0861,NUMA1 or GALE nucleic acid comprising one or more polymorphic variationsassociated with breast cancer, and where the first nucleic acidcomprises fewer polymorphic variations associated with breast cancer.The first and second nucleic acids typically comprise a nucleotidesequence selected from the group consisting of the nucleotide sequenceof SEQ ID NO: 1-5; a nucleotide sequence which encodes a polypeptideconsisting of an amino acid sequence encoded by a nucleotide sequence inSEQ ID NO: 1-5; a nucleotide sequence that is 90% or more identical tothe nucleotide sequence of SEQ ID NO: 1-5, and a nucleotide sequencewhich encodes a polypeptide that is 90% or more identical to an aminoacid sequence encoded by a nucleotide sequence in SEQ ID NO: 1-5. Thesubject often is a human.

Gene therapy vectors can be delivered to a subject by, for example,intravenous injection, local administration (see U.S. Pat. No.5,328,470) or by stereotactic injection (see e.g., Chen et al., (1994)Proc. Natl. Acad. Sci. USA 91:3054-3057). Pharmaceutical preparations ofgene therapy vectors can include a gene therapy vector in an acceptablediluent, or can comprise a slow release matrix in which the genedelivery vehicle is imbedded. Alternatively, where the complete genedelivery vector can be produced intact from recombinant cells (e.g.,retroviral vectors) the pharmaceutical preparation can include one ormore cells which produce the gene delivery system. Examples of genedelivery vectors are described herein.

Pharmaceutical compositions can be included in a container, pack, ordispenser together with instructions for administration.

Pharmaceutical compositions of active ingredients can be administered byany of the paths described herein for therapeutic and prophylacticmethods for treating breast cancer. With regard to both prophylactic andtherapeutic methods of treatment, such treatments may be specificallytailored or modified, based on knowledge obtained from pharmacogenomicanalyses described herein. As used herein, the term “treatment” isdefined as the application or administration of a therapeutic agent to apatient, or application or administration of a therapeutic agent to anisolated tissue or cell line from a patient, who has a disease, asymptom of disease or a predisposition toward a disease, with thepurpose to cure, heal, alleviate, relieve, alter, remedy, ameliorate,improve or affect the disease, the symptoms of disease or thepredisposition toward disease. A therapeutic agent includes, but is notlimited to, small molecules, peptides, antibodies, ribozymes andantisense oligonucleotides.

Administration of a prophylactic agent can occur prior to themanifestation of symptoms characteristic of the ICAA, MAPK10, KIAA0861,NUMA1 or GALE aberrance, such that a disease or disorder is preventedor, alternatively, delayed in its progression. Depending on the type ofICAM, MAPK10, KIAA0861, NUMA1 or GALE aberrance, for example, a ICAA,MAPK10, KIAA0861, NUMA1 or GALE molecule, ICAM, MAPK10, KIAA0861, NUMA1or GALE agonist, or ICAA, MAPK10, KIAA0861, NUMA1 or GALE antagonistagent can be used for treating the subject. The appropriate agent can bedetermined based on screening assays described herein.

As discussed, successful treatment of ICAM, MAPK10, KIAA0861, NUMA1 orGALE disorders can be brought about by techniques that serve to inhibitthe expression or activity of target gene products. For example,compounds (e.g., an agent identified using an assays described above)that exhibit negative modulatory activity can be used to prevent and/ortreat breast cancer. Such molecules can include, but are not limited topeptides, phosphopeptides, small organic or inorganic molecules, orantibodies (including, for example, polyclonal, monoclonal, humanized,anti-idiotypic, chimeric or single chain antibodies, and FAb, F(ab′)2and FAb expression library fragments, scFV molecules, andepitope-binding fragments thereof).

Further, antisense and ribozyme molecules that inhibit expression of thetarget gene can also be used to reduce the level of target geneexpression, thus effectively reducing the level of target gene activity.Still further, triple helix molecules can be utilized in reducing thelevel of target gene activity. Antisense, ribozyme and triple helixmolecules are discussed above.

It is possible that the use of antisense, ribozyme, and/or triple helixmolecules to reduce or inhibit mutant gene expression can also reduce orinhibit the transcription (triple helix) and/or translation (antisense,ribozyme) of mRNA produced by normal target gene alleles, such that theconcentration of normal target gene product present can be lower than isnecessary for a normal phenotype. In such cases, nucleic acid moleculesthat encode and express target gene polypeptides exhibiting normaltarget gene activity can be introduced into cells via gene therapymethod. Alternatively, in instances where the target gene encodes anextracellular polypeptide, normal target gene polypeptide often isco-administered into the cell or tissue to maintain the requisite levelof cellular or tissue target gene activity.

Another method by which nucleic acid molecules may be utilized intreating or preventing a disease characterized by ICAM, MAPK10,KIAA0861, NUMA1 or GALE expression is through the use of aptamermolecules specific for ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide. Aptamers are nucleic acid molecules having a tertiarystructure which permits them to specifically bind to polypeptide ligands(see, e.g., Osborne, et al., Curr. Opin. Chem. Biol. 1 (1): 5-9 (1997);and Patel, D. J., Curr. Opin. Chem. Biol. June;1(1): 32-46 (1997)).Since nucleic acid molecules may in many cases be more convenientlyintroduced into target cells than therapeutic polypeptide molecules maybe, aptamers offer a method by which ICAM, MAPK10, KIAA0861, NUMA1 orGALE polypeptide activity may be specifically decreased without theintroduction of drugs or other molecules which may have pluripotenteffects.

Antibodies can be generated that are both specific for target geneproduct and that reduce target gene product activity. Such antibodiesmay, therefore, by administered in instances whereby negative modulatorytechniques are appropriate for the treatment of ICAM, MAPK10, KIAA0861,NUMA1 or GALE disorders. For a description of antibodies, see theAntibody section above.

In circumstances where injection of an animal or a human subject with aICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide or epitope forstimulating antibody production is harmful to the subject, it ispossible to generate an immune response against ICAM, MAPK10, KIAA0861,NUMA1 or GALE through the use of anti-idiotypic antibodies (see, forexample, Herlyn, D., Ann. Med.;31(1): 66-78 (1999); andBhattacharya-Chatterjee & Foon, Cancer Treat. Res.; 94: 51-68 (1998)).If an anti-idiotypic antibody is introduced into a mammal or humansubject, it should stimulate the production of anti-anti-idiotypicantibodies, which should be specific to the ICAM, MAPK10, KIAA0861,NUMA1 or GALE polypeptide. Vaccines directed to a disease characterizedby ICAM, MAPK10, KIAA0861, NUMA1 or GALE expression may also begenerated in this fashion.

In instances where the target antigen is intracellular and wholeantibodies are used, internalizing antibodies may be utilized.Lipofectin or liposomes can be used to deliver the antibody or afragment of the Fab region that binds to the target antigen into cells.Where fragments of the antibody are used, the smallest inhibitoryfragment that binds to the target antigen often is utilized. Forexample, peptides having an amino acid sequence corresponding to the Fvregion of the antibody can be used. Alternatively, single chainneutralizing antibodies that bind to intracellular target antigens canalso be administered. Such single chain antibodies can be administered,for example, by expressing nucleotide sequences encoding single-chainantibodies within the target cell population (see e.g., Marasco et al.,Proc. Natl. Acad. Sci. USA 90: 7889-7893 (1993)).

ICAM, MAPK10, KIAA0861, NUMA1 or GALE molecules and compounds thatinhibit target gene expression, synthesis and/or activity can beadministered to a patient at therapeutically effective doses to prevent,treat or ameliorate ICAM, MAPK10, KIAA0861, NUMA1 or GALE disorders. Atherapeutically effective dose refers to that amount of the compoundsufficient to result in amelioration of symptoms of the disorders.

Toxicity and therapeutic efficacy of such compounds can be determined bystandard pharmaceutical procedures in cell cultures or experimentalanimals, e.g., for determining the LD₅₀ (the dose lethal to 50% of thepopulation) and the ED₅₀ (the dose therapeutically effective in 50% ofthe population). The dose ratio between toxic and therapeutic effects isthe therapeutic index and it can be expressed as the ratio LD₅₀/ED₅₀.Compounds that exhibit large therapeutic indices often are utilized.While compounds that exhibit toxic side effects can be used, care shouldbe taken to design a delivery system that targets such compounds to thesite of affected tissue in order to minimize potential damage touninfected cells and, thereby, reduce side effects.

Data obtained from cell culture assays and animal studies can be used informulating a range of dosage for use in humans. The dosage of suchcompounds often lies within a range of circulating concentrations thatinclude the ED₅₀ with little or no toxicity. The dosage can vary withinthis range depending upon the dosage form employed and the route ofadministration utilized. For any compound used in a method describedherein, the therapeutically effective dose can be estimated initiallyfrom cell culture assays. A dose can be formulated in animal models toachieve a circulating plasma concentration range that includes the IC₅₀(i.e., the concentration of the test compound that achieves ahalf-maximal inhibition of symptoms) as determined in cell culture. Suchinformation can be used to more accurately determine useful doses inhumans. Levels in plasma can be measured, for example, by highperformance liquid chromatography.

Another example of effective dose determination for an individual is theability to directly assay levels of “free” and “bound” compound in theserum of the test subject. Such assays may utilize antibody mimicsand/or “biosensors” that have been created through molecular imprintingtechniques. The compound which is able to modulate ICAM, MAPK10,KIAA0861, NUMA1 or GALE activity is used as a template, or “imprintingmolecule”, to spatially organize polymerizable monomers prior to theirpolymerization with catalytic reagents. The subsequent removal of theimprinted molecule leaves a polymer matrix which contains a repeated“negative image” of the compound and is able to selectively rebind themolecule under biological assay conditions. A detailed review of thistechnique can be seen in Ansell et al., Current Opinion in Biotechnology7: 89-94 (1996) and in Shea, Trends in Polymer Science 2: 166-173(1994). Such “imprinted” affinity matrixes are amenable toligand-binding assays, whereby the immobilized monoclonal antibodycomponent is replaced by an appropriately imprinted matrix. An exampleof the use of such matrixes in this way can be seen in Vlatakis, et al.,Nature 361: 645-647 (1993). Through the use of isotope-labeling, the“free” concentration of compound which modulates the expression oractivity of ICAA, MAPK10, KIAA0861, NUMA1 or GALE can be readilymonitored and used in calculations of IC₅₀. Such “imprinted” affinitymatrixes can also be designed to include fluorescent groups whosephoton-emitting properties measurably change upon local and selectivebinding of target compound. These changes can be readily assayed in realtime using appropriate fiberoptic devices, in turn allowing the dose ina test subject to be quickly optimized based on its individual IC₅₀. Arudimentary example of such a “biosensor” is discussed in Kriz et al.,Analytical Chemistry 67: 2142-2144 (1995).

Provided herein are methods of modulating ICAM, MAPK10, KIAA0861, NUMA1or GALE expression or activity for therapeutic purposes. Accordingly, inan exemplary embodiment, the modulatory method involves contacting acell with a ICAM, MAPK10, KIAA0861, NUMA1 or GALE or agent thatmodulates one or more of the activities of ICAM, MAPK10, KIAA0861, NUMA1or GALE polypeptide activity associated with the cell. An agent thatmodulates ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide activity canbe an agent as described herein, such as a nucleic acid or apolypeptide, a naturally-occurring target molecule of a ICAM, MAPK10,KIAA0861, NUMA1 or GALE polypeptide (e.g., a ICAM, MAPK10, KIAA0861,NUMA1 or GALE substrate or receptor), a ICAM, MAPK10, KIAA0861, NUMA1 orGALE antibody, a ICAM, MAPK10, KIAA0861, NUMA1 or GALE agonist orantagonist, a peptidomimetic of a ICAM, MAPK10, KIAA0861, NUMA1 or GALEagonist or antagonist, or other small molecule.

In one embodiment, the agent stimulates one or more ICAM, MAPK10,KIAA0861, NUMA1 or GALE activities. Examples of such stimulatory agentsinclude active ICAM, MAPK10, KIAA0861, NUMA1 or GALE polypeptide and anucleic acid molecule encoding ICAM, MAPK10, KIAA0861, NUMA1 or GALE. Inanother embodiment, the agent inhibits one or more ICAM, MAPK10,KIAA0861, NUMA1 or GALE activities. Examples of such inhibitory agentsinclude antisense ICAM, MAPK10, KIAA0861, NUMA1 or GALE nucleic acidmolecules, anti-ICAM, MAPK10, KIAA0861, NUMA1 or GALE antibodies, andICAM, MAPK10, KIAA0861, NUMA1 or GALE inhibitors, and competitiveinhibitors that target ICAM, MAPK10, KIAA0861, NUMA1 or GALE. Thesemodulatory methods can be performed in vitro (e.g., by culturing thecell with the agent) or, alternatively, in vivo (e.g., by administeringthe agent to a subject). As such, provided are methods of treating anindividual afflicted with a disease or disorder characterized byaberrant or unwanted expression or activity of a ICAM, MAPK10, KIAA0861,NUMA1 or GALE polypeptide or nucleic acid molecule. In one embodiment,the method involves administering an agent (e.g., an agent identified bya screening assay described herein), or combination of agents thatmodulates (e.g., upregulates or downregulates) ICAM, MAPK10, KIAA0861,NUMA1 or GALE expression or activity. In another embodiment, the methodinvolves administering a ICAM, MAPK10, KIAA0861, NUMA1 or GALEpolypeptide or nucleic acid molecule as therapy to compensate forreduced, aberrant, or unwanted ICAM, MAPK10, KIAA0861, NUMA1 or GALEexpression or activity.

Stimulation of ICAM, MAPK10, KIAA0861, NUMA1 or GALE activity isdesirable in situations in which ICAM, MAPK10, KIAA0861, NUMA1 or GALEis abnormally downregulated and/or in which increased ICAM, MAPK10,KIAA0861, NUMA1 or GALE activity is likely to have a beneficial effect.For example, stimulation of ICAM, MAPK10, KIAA0861, NUMA1 or GALEactivity is desirable in situations in which a ICAM, MAPK10, KIAA0861,NUMA1 or GALE is downregulated and/or in which increased ICAM, MAPK10,KIAA0861, NUMA1 or GALE activity is likely to have a beneficial effect.Likewise, inhibition of ICAM, MAPK10, KIAA0861, NUMA1 or GALE activityis desirable in situations in which ICAM, MAPK10, KIAA0861, NUMA1 orGALE is abnormally upregulated and/or in which decreased ICAM, MAPK10,KIAA0861, NUMA1 or GALE activity is likely to have a beneficial effect.

Methods of Treatment

In another aspect, provided are methods for identifying a risk of cancerin an individual as described herein and, if a genetic predisposition isidentified, treating that individual to delay or reduce or prevent thedevelopment of cancer. Such a procedure can be used to treat breastcancer. Optionally, treating an individual for cancer may includeinhibiting cellular proliferation, inhibiting metastasis, inhibitinginvasion, or preventing tumor formation or growth as defined herein.Suitable treatments to prevent or reduce or delay breast cancer focus oninhibiting additional cellular proliferation, inhibiting metastasis,inhibiting invasion, and preventing further tumor formation or growth.Treatment usually includes surgery followed by radiation therapy.Surgery may be a lumpectomy or a mastectomy (e.g., total, simple orradical). Even if the doctor removes all of the cancer that can be seenat the time of surgery, the patient may be given radiation therapy,chemotherapy, or hormone therapy after surgery to try to kill any cancercells that may be left. Radiation therapy is the use of x-rays or othertypes of radiation to kill cancer cells and shrink tumors. Radiationtherapy may use external radiation (using a machine outside the body) orinternal radiation. Chemotherapy is the use of drugs to kill cancercells. Chemotherapy may be taken by mouth, or it may be put into thebody by inserting a needle into a vein or muscle. Hormone therapy oftenfocuses on estrogen and progesterone, which are hormones that affect theway some cancers grow. If tests show that the cancer cells have estrogenand progesterone receptors (molecules found in some cancer cells towhich estrogen and progesterone will attach), hormone therapy is used toblock the way these hormones help the cancer grow. Hormone therapy withtamoxifen is often given to patients with early stages of breast cancerand those with metastatic breast cancer. Other types of treatment beingtested in clinical trials include sentinel lymph node biopsy followed bysurgery and high-dose chemotherapy with bone marrow transplantation andperipheral blood stem cell transplantation. Any preventative/therapeutictreatment known in the art may be prescribed and/or administered,including, for example, surgery, chemotherapy and/or radiationtreatment, and any of the treatments may be used in combination with oneanother to treat or prevent breast cancer (e.g., surgery followed byradiation therapy).

Also provided are methods of preventing or treating cancer comprisingproviding an individual in need of such treatment with a ICAM, MAPK10,KIAA0861, NUMA1 or GALE inhibitor that reduces or inhibits theoverexpression of mutant ICAM, MAPK10, KIAA0861, NUMA1 or GALE (e.g., aICAM, MAPK10, KIAA0861, NUMA1 or GALE polynucleotide with an allele thatis associated with cancer). Included herein are methods of reducing orblocking the expression of ICAM, MAPK10, KIAA0861, NUMA1 or GALEcomprising providing or administering to individuals in need of reducingor blocking the expression of ICAM, MAPK10, KIAA0861, NUMA1 or GALE apharmaceutical or physiologically acceptable composition comprising amolecule capable of inhibiting expression of ICAM, MAPK10, KIAA0861,NUMA1 or GALE, e.g., a siRNA molecule. Also included herein are methodsof reducing or blocking the expression of secondary regulatory genesregulated by ICAM, MAPK10, KIAA0861, NUMA1 or GALE that play a role inoncogenesis which comprises introducing competitive inhibitors thattarget ICAM, MAPK10, KIAA0861, NUMA1 or GALE's effect on theseregulatory genes or that block the binding of positive factors necessaryfor the expression of these regulatory genes.

The examples set forth below are intended to illustrate but not limitthe invention.

Examples

In the following studies a group of subjects were selected according tospecific parameters relating to breast cancer. Nucleic acid samplesobtained from individuals in the study group were subjected to geneticanalysis, which identified associations between breast cancer andcertain polymorphic variants in ICAM region, MAPK10, KIAA0861,NUMA1/FLJ20625/LOC220074 region, and HT014/LOC148902/LYPLA2/GALE regionloci (herein referred to as “target genes”, “target nucleotides”,“target polypeptides” or simply “targets”). In addition, methods aredescribed for combining information from multiple SNPs from the targetgenes found to be independently associated with breast cancer status ina case-control study. The resulting model permits a powerful, moreinformative quantitation of the combined value of the SNPs forpredicting breast cancer susceptibility.

Example 1 Samples and Pooling Strategies Sample Selection

Blood samples were collected from individuals diagnosed with breastcancer, which were referred to as case samples. Also, blood samples werecollected from individuals not diagnosed with breast cancer as genderand age-matched controls. All of the samples were of German/Germandescent. A database was created that listed all phenotypic traitinformation gathered from individuals for each case and control sample.Genomic DNA was extracted from each of the blood samples for geneticanalyses.

DNA Extraction from Blood Samples

Six to ten milliliters of whole blood was transferred to a 50 ml tubecontaining 27 ml of red cell lysis solution (RCL). The tube was inverteduntil the contents were mixed. Each tube was incubated for 10 minutes atroom temperature and inverted once during the incubation. The tubes werethen centrifuged for 20 minutes at 3000×g and the supernatant wascarefully poured off. 100-200 μl of residual liquid was left in the tubeand was pipetted repeatedly to resuspend the pellet in the residualsupernatant. White cell lysis solution (WCL) was added to the tube andpipetted repeatedly until completely mixed. While no incubation wasnormally required, the solution was incubated at 37° C. or roomtemperature if cell clumps were visible after mixing until the solutionwas homogeneous. 2 ml of protein precipitation was added to the celllysate. The mixtures were vortexed vigorously at high speed for 20 secto mix the protein precipitation solution uniformly with the celllysate, and then centrifuged for 10 minutes at 3000×g. The supernatantcontaining the DNA was then poured into a clean 15 ml tube, whichcontained 7 ml of 100% isopropanol. The samples were mixed by invertingthe tubes gently until white threads of DNA were visible. Samples werecentrifuged for 3 minutes at 2000×g and the DNA was visible as a smallwhite pellet. The supernatant was decanted and 5 ml of 70% ethanol wasadded to each tube. Each tube was inverted several times to wash the DNApellet, and then centrifuged for 1 minute at 2000×g. The ethanol wasdecanted and each tube was drained on clean absorbent paper. The DNA wasdried in the tube by inversion for 10 minutes, and then 1000 μl of 1× TEwas added. The size of each sample was estimated, and less TE buffer wasadded during the following DNA hydration step if the sample was smaller.The DNA was allowed to rehydrate overnight at room temperature, and DNAsamples were stored at 2-8° C.

DNA was quantified by placing samples on a hematology mixer for at least1 hour. DNA was serially diluted (typically 1:80, 1:160, 1:320, and1:640 dilutions) so that it would be within the measurable range ofstandards. 125 μl of diluted DNA was transferred to a clear U-bottommicrotitre plate, and 125 μl of 1× TE buffer was transferred into eachwell using a multichannel pipette. The DNA and 1× TE were mixed byrepeated pipetting at least 15 times, and then the plates were sealed.50 μl of diluted DNA was added to wells A5-H12 of a black flat bottommicrotitre plate. Standards were inverted six times to mix them, andthen 50 μl of 1× TE buffer was pipetted into well A1, 1000 ng/ml ofstandard was pipetted into well A2, 500 ng/ml of standard was pipettedinto well A3, and 250 ng/ml of standard was pipetted into well A4.PicoGreen (Molecular Probes, Eugene, Oreg.) was thawed and freshlydiluted 1:200 according to the number of plates that were beingmeasured. PicoGreen was vortexed and then 50 μl was pipetted into allwells of the black plate with the diluted DNA. DNA and PicoGreen weremixed by pipetting repeatedly at least 10 times with the multichannelpipette. The plate was placed into a Fluoroskan Ascent Machine(microplate fluorometer produced by Labsystems) and the samples wereallowed to incubate for 3 minutes before the machine was run usingfilter pairs 485 nm excitation and 538 nm emission wavelengths. Sampleshaving measured DNA concentrations of greater than 450 ng/μl werere-measured for conformation. Samples having measured DNA concentrationsof 20 ng/μl or less were re-measured for confirmation.

Pooling Strategies

Samples were placed into one of two groups based on disease status. Thetwo groups were female case groups and female control groups. A selectset of samples from each group were utilized to generate pools, and onepool was created for each group. Each individual sample in a pool wasrepresented by an equal amount of genomic DNA. For example, where 25 ngof genomic DNA was utilized in each PCR reaction and there were 200individuals in each pool, each individual would provide 125 pg ofgenomic DNA. Inclusion or exclusion of samples for a pool was based uponthe following criteria: the sample was derived from an individualcharacterized as Caucasian; the sample was derived from an individual ofGerman paternal and maternal descent; the database included relevantphenotype information for the individual; case samples were derived fromindividuals diagnosed with breast cancer; control samples were derivedfrom individuals free of cancer and no family history of breast cancer;and sufficient genomic DNA was extracted from each blood sample for allallelotyping and genotyping reactions performed during the study.Phenotype information included pre- or post-menopausal, familialpredisposition, country or origin of mother and father, diagnosis withbreast cancer (date of primary diagnosis, age of individual as ofprimary diagnosis, grade or stage of development, occurrence ofmetastases, e.g., lymph node metastases, organ metastases), condition ofbody tissue (skin tissue, breast tissue, ovary tissue, peritoneum tissueand myometrium), method of treatment (surgery, chemotherapy, hormonetherapy, radiation therapy). Samples that met these criteria were addedto appropriate pools based on gender and disease status.

The selection process yielded the pools set forth in Table 1, which wereused in the studies that follow:

TABLE 1 Female CASE Female CONTROL Pool size 272 276 (Number) PoolCriteria case control (ex: case/control) Mean Age 59.6 55.4 (ex: years)

Example 2 Association of Polymorphic Variants with Breast Cancer

A whole-genome screen was performed to identify particular SNPsassociated with occurrence of breast cancer. As described in Example 1,two sets of samples were utilized, which included samples from femaleindividuals having breast cancer (breast cancer cases) and samples fromfemale individuals not having cancer (female controls). The initialscreen of each pool was performed in an allelotyping study, in whichcertain samples in each group were pooled. By pooling DNA from eachgroup, an allele frequency for each SNP in each group was calculated.These allele frequencies were then compared to one another. ParticularSNPs were considered as being associated with breast cancer when allelefrequency differences calculated between case and control pools werestatistically significant. SNP disease association results obtained fromthe allelotyping study were then validated by genotyping each associatedSNP across all samples from each pool. The results of the genotypingwere then analyzed, allele frequencies for each group were calculatedfrom the individual genotyping results, and a p-value was calculated todetermine whether the case and control groups had statisticallysignificantly differences in allele frequencies for a particular SNP.When the genotyping results agreed with the original allelotypingresults, the SNP disease association was considered validated at thegenetic level.

SNP Panel Used for Genetic Analyses

A whole-genome SNP screen began with an initial screen of approximately25,000 SNPs over each set of disease and control samples using a poolingapproach. The pools studied in the screen are described in Example 1.The SNPs analyzed in this study were part of a set of 25,488 SNPsconfirmed as being statistically polymorphic as each is characterized ashaving a minor allele frequency of greater than 10%. The SNPs in the setreside in genes or in close proximity to genes, and many reside in geneexons. Specifically, SNPs in the set are located in exons, introns, andwithin 5,000 base-pairs upstream of a transcription start site of agene. In addition, SNPs were selected according to the followingcriteria: they are located in ESTs; they are located in Locuslink orEnsemble genes; and they are located in Genomatix promoter predictions.SNPs in the set also were selected on the basis of even spacing acrossthe genome, as depicted in Table 2.

A case-control study design using a whole genome association strategyinvolving approximately 28,000 single nucleotide polymorphisms (SNPs)was employed. Approximately 25,000 SNPs were evenly spaced in gene-basedregions of the human genome with a median inter-marker distance of about40,000 base pairs. Additionally, approximately 3,000 SNPs causing aminoacid substitutions in genes described in the literature as candidatesfor various diseases were used. The case-control study samples were offemale German origin (German paternal and maternal descent) 548individuals were equally distributed in two groups (female controls andfemale cases). The whole genome association approach was first conductedon 2 DNA pools representing the 2 groups. Significant markers wereconfirmed by individual genotyping.

TABLE 2 General Statistics Spacing Statistics Total # of SNPs   25,488Median  37,058 bp # of Exonic SNPs >4,335 (17%) Minimum*  1,000 bp #SNPs with refSNP ID 20,776 (81%) Maximum* 3,000,000 bp   GeneCoverage >10,000 Mean 122,412 bp Chromosome Coverage All Std 373,325 bpDeviation *Excludes outliers

Allelotyping and Genotyping Results

The genetic studies summarized above and described in more detail belowidentified allelic variants associated with breast cancer. The allelicvariants identified from the SNP panel described in Table 2 aresummarized below in Table 3.

TABLE 3 Position SNP Chromosome in Contig Contig Sequence SequenceAllelic Reference Position FIG. Identification Position IdentificationLocus Position Variability 1056538 10248147 44247 NT_011295 NM_000201ICAM region C/T 1541998 87342924 36424 NT_016354 11444849 NM_002753MAPK10 intragenic C/T 2001449 184330963 48563 NT_005962 18141399NM_015078 KIAA0861 intragenic G/C 673478 72021802 49002 NT_0339271998133 NM_006185 NUMA1 T/C NM_017907 FLJ20625 downstream NM_145309LOC220074 4237 10291777 87877 NT_004391 454476 NM_000403 GALE downstreamA/G NT_004610 NM_020362 HT014 NO. INFO. NO INFO. LOC148902 NT_004610NM_007260 LYPLA2

Table 3 includes information pertaining to the incident polymorphicvariant associated with breast cancer identified herein. Publicinformation pertaining to the polymorphism and the genomic sequence thatincludes the polymorphism are indicated. The genomic sequencesidentified in Table 3 may be accessed at the world wide web addressncbi.nih.gov/entrez/query.fcgi, for example, by using the publiclyavailable SNP reference number (e.g., rs1541998). The chromosomeposition refers to the position of the SNP within NCBI's Genome Build33, which may be accessed at the following world wide web address:ncbi.nlm.nih.gov/mapview/map_search.cgi?chr=hum_chr.inf&query=. The“Contig Position” provided in Table 3 corresponds to a nucleotideposition set forth in the contig sequence, and designates thepolymorphic site corresponding to the SNP reference number. The sequencecontaining the polymorphisms also may be referenced by the “SequenceIdentification” set forth in Table 3. The “Sequence Identification”corresponds to cDNA sequence that encodes associated target polypeptides(e.g., NUMA1) of the invention. The position of the SNP within the cDNAsequence is provided in the “Sequence Position” column of Table 3. Also,the allelic variation at the polymorphic site and the allelic variantidentified as associated with breast cancer is specified in Table 3. Allnucleotide sequences referenced and accessed by the parameters set forthin Table 3 are incorporated herein by reference. The positions for theseSNPs are indicated in the tables below in FIGS. 1, 2, 3 and 4, and theincident SNP for the GALE region is at position 174 in FIG. 5.

Assay for Verifying, Allelotyping, and Genotyping SNPs

A MassARRAY™ system (Sequenom, Inc.) was utilized to perform SNPgenotyping in a high-throughput fashion. This genotyping platform wascomplemented by a homogeneous, single-tube assay method (hME™ orhomogeneous MassEXTEND™ (Sequenom, Inc.)) in which two genotypingprimers anneal to and amplify a genomic target surrounding a polymorphicsite of interest. A third primer (the MassEXTEND™ primer), which iscomplementary to the amplified target up to but not including thepolymorphism, was then enzymatically extended one or a few bases throughthe polymorphic site and then terminated.

For each polymorphism, SpectroDESIGNER™ software (Sequenom, Inc.) wasused to generate a set of PCR primers and a MassEXTEND™ primer was usedto genotype the polymorphism. Table 4 shows PCR primers and Table 5shows extension primers used for analyzing polymorphisms. The initialPCR amplification reaction was performed in a 5 μl total volumecontaining 1× PCR buffer with 1.5 mM MgCl₂ (Qiagen), 200 μM each ofDATP, dGTP, dCTP, dTTP (Gibco-BRL), 2.5 ng of genomic DNA, 0.1 units ofHotStar DNA polymerase (Qiagen), and 200 nM each of forward and reversePCR primers specific for the polymorphic region of interest.

TABLE 4 PCR Primers Reference Forward Reverse SNP ID PCR primer PCRprimer 1056538 GACAGCCACAGCTAGCGCAGA TGTTTTCGCCCCCCAGG GTGAC 1541998CTGATTATTCTGATGGTAATG GCCCATGTTAACATTTT CTTC 2001449ATGTCAAGTGCACCCACATG AGGAAGAAACTGACGGA AGG 673478 TAATACAAAGGTGGCAGCAGTTGACAAGGATAAGGAC AAG 4237 GCACATGGCCACATTAACTGG TGGCTGTGGAAATTGGGT CTTG

Samples were incubated at 95° C. for 15 minutes, followed by 45 cyclesof 95° C. for 20 seconds, 56° C. for 30 seconds, and 72° C. for 1minute, finishing with a 3 minute final extension at 72° C. Followingamplification, shrimp alkaline phosphatase (SAP) (0.3 units in a 2 μlvolume) (Amersham Pharmacia) was added to each reaction (total reactionvolume was 7 μl) to remove any residual dNTPs that were not consumed inthe PCR step. Samples were incubated for 20 minutes at 37° C., followedby 5 minutes at 85° C. to denature the SAP.

Once the SAP reaction was complete, a primer extension reaction wasinitiated by adding a polymorphism-specific MassEXTEND™ primer cocktailto each sample. Each MassEXTEND™ cocktail included a specificcombination of dideoxynucleotides (ddNTPs) and deoxynucleotides (dNTPs)used to distinguish polymorphic alleles from one another. In Table 5,ddNTPs are shown and the fourth nucleotide not shown is the dNTP.

TABLE 5 Extend Primers Reference Extend Term SNP ID Probe Mix 1056538CCCAGGGTGACGTTGCAGA ACG 1541998 ATTATTCTGATGGTAATGATCCAG ACG 2001449CACATGCCTGCTCGCCCCC ACT 673478 AAGGGGAGGTCGACTGGG ACT 4237GGCATCTGGCAGTCATGG ACT

The MassEXTEND™ reaction was performed in a total volume of 9 μl, withthe addition of 1× ThermoSequenase buffer, 0.576 units ofThermoSequenase (Amersham Pharmacia), 600 nM MassEXTEND™ primer, 2 mM ofddATP and/or ddCTP and/or ddGTP and/or ddTTP, and 2 mM of DATP or dCTPor dGTP or dTTP. The deoxy nucleotide (dNTP) used in the assay normallywas complementary to the nucleotide at the polymorphic site in theamplicon. Samples were incubated at 94° C. for 2 minutes, followed by 55cycles of 5 seconds at 94° C., 5 seconds at 52° C., and 5 seconds at 72°C.

Following incubation, samples were desalted by adding 16 μl of water(total reaction volume was 25 μl), 3 mg of SpectroCLEAN™ sample cleaningbeads (Sequenom, Inc.) and allowed to incubate for 3 minutes withrotation. Samples were then robotically dispensed using a piezoelectricdispensing device (SpectroJET™ (Sequenom, Inc.)) onto either 96-spot or384-spot silicon chips containing a matrix that crystallized each sample(SpectroCHIP® (Sequenom, Inc.)). Subsequently, MALDI-TOF massspectrometry (Biflex and Autoflex MALDI-TOF mass spectrometers (BrukerDaltonics) can be used) and SpectroTYPER RT™ software (Sequenom, Inc.)were used to analyze and interpret the SNP genotype for each sample.

Genetic Analysis

Variations identified in the target genes are provided in theirrespective genomic sequences (see FIGS. 1-5) Minor allelic frequenciesfor these polymorphisms was verified as being 10% or greater bydetermining the allelic frequencies using the extension assay describedabove in a group of samples isolated from 92 individuals originatingfrom the state of Utah in the United States, Venezuela and France(Coriell cell repositories).

Genotyping results are shown for female pools in Table 6A and 6B. Table6A shows the orginal genotyping results and Table 6B shows the genotypedresults re-analyzed to remove duplicate individuals from the cases andcontrols (i.e., individuals who were erroneously included more than onceas either cases or controls). Therefore, Table 6B represents a moreaccurate measure of the allele frequencies for this particular SNP. Inthe subsequent tables, “AF” refers to allelic frequency; and “F case”and “F control” refer to female case and female control groups,respectively.

TABLE 6A Breast Reference AF AF Odds Cancer SNP ID F case F controlp-value Ratio Assoc. Allele 1056538 C = 0.651 C = 0.564 0.0038 0.69 C T= 0.349 T = 0.436 1541998 T = 0.780 T = 0.839 0.0153 0.69 C C = 0.220 C= 0.161 2001449 G = 0.703 G = 0.780 0.0040 1.49 C C = 0.297 C = 0.220673478 T = 0.919 T = 0.953 0.0238 1.74 C C = 0.081 C = 0.047 4237 A =0.590 A = 0.530 0.0431 0.78 A G = 0.410 G = 0.470

TABLE 6B Breast Reference AF AF Odds Cancer SNP ID F case F controlp-value Ratio Assoc. Allele 1056538 C = 0.658 C = 0.556 0.0012 0.65 C T= 0.342 T = 0.444 1541998 T = 0.771 T = 0.839 0.0070 0.65 C C = 0.229 C= 0.161 2001449 G = 0.693 G = 0.782 0.0012 1.59 C C = 0.307 C = 0.218673478 T = 0.916 T = 0.953 0.0171 1.85 C C = 0.084 C = 0.047 4237 A =0.584 A = 0.527 0.0704 0.79 A G = 0.416 G = 0.473

The single marker alleles set forth in Table 3 were consideredvalidated, since the genotyping data for the females, males or bothpools were significantly associated with breast cancer, and because thegenotyping results agreed with the original allelotyping results.Particularly significant associations with breast cancer are indicatedby a calculated p-value of less than 0.05 for genotype results, whichare set forth in bold text.

Odds ratio results are shown in Tables 6A and 6B. An odds ratio is anunbiased estimate of relative risk which can be obtained from mostcase-control studies. Relative risk (RR) is an estimate of thelikelihood of disease in the exposed group (susceptibility allele orgenotype carriers) compared to the unexposed group (not carriers). Itcan be calculated by the following equation:

RR=IA/Ia

IA is the incidence of disease in the A carriers and Ia is the incidenceof disease in the non-carriers.

RR>1 indicates the A allele increases disease susceptibility.

RR<1 indicates the a allele increases disease susceptibility.

For example, RR=1.5 indicates that carriers of the A allele have 1.5times the risk of disease than non-carriers, i.e., 50% more likely toget the disease.

Case-control studies do not allow the direct estimation of IA and Ia,therefore relative risk cannot be directly estimated. However, the oddsratio (OR) can be calculated using the following equation:

OR=(nDAnda)/(ndAnDa)=pDA(1−pdA)/pdA(1−pDA), or

OR=((case f)/(1−case f))/((control f)/(1−control f)), wheref=susceptibility allele frequency.

An odds ratio can be interpreted in the same way a relative risk isinterpreted and can be directly estimated using the data fromcase-control studies, i.e., case and control allele frequencies. Thehigher the odds ratio value, the larger the effect that particularallele has on the development of breast cancer. Possessing an alleleassociated with a relatively high odds ratio translates to having ahigher risk of developing or having breast cancer.

Example 3 Samples and Pooling Strategies for the Replication Samples

The SNPs of Table 3 were genotyped again in a collection of replicationsamples to further validate its association with breast cancer. Like theoriginal study population described in Examples 1 and 2, the replicationsamples consisted of females diagnosed with breast cancer (cases) andfemales without cancer (controls). The case and control samples wereselected and genotyped as described below.

Pooling Strategies

Samples were placed into one of two groups based on disease status. Thetwo groups were female case groups and female control groups. A selectset of samples from each group were utilized to generate pools, and onepool was created for each group. Each individual sample in a pool wasrepresented by an equal amount of genomic DNA. For example, where 25 ngof genomic DNA was utilized in each PCR reaction and there were 190individuals in each pool (i.e., 190 cases and 190 controls), eachindividual would provide 125 pg of genomic DNA. Inclusion or exclusionof samples for a pool was based upon the following criteria: the samplewas derived from a female individual characterized as Caucasian fromAustralia; case samples were derived from individuals diagnosed withbreast cancer; control samples were derived from individuals free ofcancer and no family history of breast cancer; and sufficient genomicDNA was extracted from each blood sample for all allelotyping andgenotyping reactions performed during the study. Samples in the poolsalso were age-matched. Samples that met these criteria were added toappropriate pools based on gender and disease status.

The selection process yielded the pools set forth in Table 7, which wereused in the studies that follow:

TABLE 7 Female CASE Female CONTROL Pool size 190 190 (Number) PoolCriteria Case control (ex: case/control) Mean Age 64.5 ** (ex: years) **Each case was matched by a control within 5 years of age of the case.

The replication genotyping results are shown in Table 8. The odds ratiowas calculated as described in Example 2.

TABLE 8 Reference AF AF Odds SNP ID F case F control p-value Ratio1056538 C = 0.650 C = 0.584 0.0624 0.75 T = 0.350 T = 0.416 1541998 T =0.820 T = 0.864 0.1010 0.72 C = 0.180 C = 0.136 2001449 G = 0.685 G =0.777 0.005 1.59 C = 0.315 C = 0.223 673478 T = 0.927 T = 0.957 0.0771.76 C = 0.073 C = 0.043 4237 A = 0.632 A = 0.577 0.1260 1.26 G = 0.368G = 0.423

The absence of a statistically significant association in thereplication cohort should not be interpreted as minimizing the value ofthe original finding. There are many reasons why a biologically derivedassociation identified in a sample from one population would notreplicate in a sample from another population. The most important reasonis differences in population history. Due to bottlenecks and foundereffects, there may be common disease predisposing alleles present in onepopulation that are relatively rare in another, leading to a lack ofassociation in the candidate region. Also, because common diseases suchas breast cancer are the result of susceptibilities in many genes andmany environmental risk factors, differences in population-specificgenetic and environmental backgrounds could mask the effects of abiologically relevant allele. For these and other reasons, statisticallystrong results in the original, discovery sample that did not replicatein the replication sample may be further evaluated in additionalreplication cohorts and experimental systems.

Example 4 ICAM Region Proximal SNPs

It has been discovered that a polymorphic variation (rs1056538) in aregion that encodes ICAM1, ICAM2 and ICAM5 is associated with theoccurrence of breast cancer (see Examples 1 and 2). Subsequently, SNPsproximal to the incident SNP (rs1056538) were identified and allelotypedin breast cancer sample sets and control sample sets as described inExamples 1 and 2. Approximately seventy-five allelic variants locatedwithin the ICAM region were identified and allelotyped. The polymorphicvariants are set forth in Table 9. The chromosome position provided incolumn four of Table 9 is based on Genome “Build 33” of NCBI's GenBank.

TABLE 9 dbSNP Position in Chromosome Allele rs# Chromosome FIG. 1Position Variants 2884487 19 139 10204039 T/C 1059840 19 11799 10215699A/T 11115 19 11851 10215751 T/C 1059849 19 11963 10215863 G/A 3093035 1924282 10228182 A/G ICAM_SNPA 19 26849 10230749 A/T 281428 19 2963310233533 C/T 281431 19 31254 10235154 T/C ICAM_SNPB 19 31967 10235867G/C 2358581 19 32920 10236820 G/T 281434 19 33929 10237829 A/G ICAM_SNPC19 35599 10239499 G/C 1799969 19 36101 10240001 G/A 3093033 19 3634010240240 G/A ICAM_SNPD 19 36405 10240305 A/G ICAM_SNPE 19 36517 10240417T/C ICAM_SNPF 19 36777 10240677 A/G 5498 19 36992 10240892 G/A ICAM_SNPG19 37645 10241545 T/C 1057981 19 37868 10241768 G/A 281436 19 3844010242340 A/G 923366 19 38532 10242432 T/C 281437 19 38547 10242447 C/TICAM_SNPH 19 38712 10242612 T/C 281438 19 40684 10244584 T/G 3093029 1940860 10244760 C/G 2569693 19 41213 10245113 C/T 281439 19 4141910245319 G/C 281440 19 41613 10245513 G/A ICAM_SNPI 19 42407 10246307C/G 1333881 19 43440 10247340 T/C 1056538 19 44247 10248147 T/C 222861519 44677 10248577 A/G 2569702 19 45256 10249156 T/C 2569703 19 4553610249436 C/G ICAM_SNPJ 19 46153 10250053 C/T 2569707 19 47546 10251446C/G 2916060 19 47697 10251597 A/C 885743 19 47944 10251844 A/T ICAM_SNPK19 48530 10252430 C/G 892188 19 51102 10255002 T/C 2291473 19 5709010260990 T/C 281416 19 60093 10263993 A/G 281417 19 60439 10264339 T/C281418 19 62694 10266594 G/C 430092 19 66260 10270160 C/T 368835 1967295 10271195 A/G 2358583 19 67304 10271204 T/G ICAM_SNPL 19 6773110271631 G/T 1045384 19 68555 10272455 C/A 281427 19 70429 10274329 C/T3745264 19 70875 10274775 T/G 281426 19 72360 10276260 G/A 281424 1974228 10278128 C/T 281423 19 76802 10280702 C/T 281422 19 77664 10281564T/C 281421 19 78803 10282703 A/G 281420 19 79263 10283163 A/G 3745263 1980810 10284710 A/G 3745261 19 81020 10284920 T/C 3181049 19 8242610286326 T/C 281412 19 82783 10286683 T/C 2230399 19 85912 10289812 C/G2278442 19 86135 10290035 G/A 2304237 19 87877 10291777 T/C 281413 1988043 10291943 G/A 1058154 19 88206 10292106 A/C 3176769 19 8834310292243 T/C 2304240 19 90701 10294601 G/A 3176768 19 90974 10294874 A/G3176767 19 91060 10294960 C/A 3176766 19 91087 10294987 C/T ICAM_SNPM 1991594 10295494 G/A 281415 19 92302 10296202 T/G 3176764 19 9238410296284 A/G

Assay for Verifying and Allelotyping SNPs

The methods used to verify and allelotype the proximal SNPs of Table 9are the same methods described in Examples 1 and 2 herein. The PCRprimers and extend primers used in these assays are provided in Table 10and Table 11, respectively.

TABLE 10 dbSNP Forward Reverse rs# PCR primer PCR primer 5498ACGTTGGATGCTCACAGAGCACATTCACGG ACGTTGGATGAGATCTTGAGGGCACCTACC 11115ACGTTGGATGAGGTGACACCTTCCTCGAAG ACGTTGGATGTGTGAAGCACCTCTTCTGAG 11115ACGTTGGATGGTCCAGGTGACACCTTCCTC ACGTTGGATGAAGCACCTCTTCTGAGCCAG 56901ACGTTGGATGGTCCAGGTGACACCTTCCTC ACGTTGGATGAAGCACCTCTTCTGAGCCAG 240914ACGTTGGATGTTCAACAAGCGAGTGACAGC ACGTTGGATGGTGCAGAGATGGGCTTTCTC 254615ACGTTGGATGTGTAGATGGTCACGTTCTCC ACGTTGGATGATCTGAGTCCTGATGTCACC 254615ACGTTGGATGTTGCAGCTTTAAGCTAAGGC ACGTTGGATGAGCCCAGGAGACTTAATTAC 272539ACGTTGGATGTACAGACCCCTCTACCCCTTC ACGTTGGATGAGGTGACACCTTCCTCGAAG 281412ACGTTGGATGTGACCTCAGGTGATTCACCC ACGTTGGATGGGTATACCTTTAGCTGGCTG 281413ACGTTGGATGTCAAAGCTCACAGTTCTCGG ACGTTGGATGACTTAGCGGGTCCTGCAAAC 281414ACGTTGGATGAAGGCACCTTCCTCTGTCAG ACGTTGGATGTGGGCCACAACACGGATGGTA 281415ACGTTGGATGGCACAAAGAGCTAAGGTAGG ACGTTGGATGGAATCCTGGATAGACAGTGG 281416ACGTTGGATGTAACGTAGAGCACAGGTGAG ACGTTGGATGCAACGCAAACACCAGTGTGG 281417ACGTTGGATGAAGAGACAGTGGAGAGGCTG ACGTTGGATGAGAGCCATCGGGTCCCAGCAA 281418ACGTTGGATGTGCGCTCAGTCAGCTTCCTC ACGTTGGATGAGTGTTAGCCGAGGGCAAGC 281420ACGTTGGATGCCAGGACTGTCTCTCTGTTT ACGTTGGATGATGACACTACAGCCTGAGCA 281421ACGTTGGATGAGTGTTGCTTTGTCACCCAG ACGTTGGATGAGGAGAATCGCTTGTACCTG 281422ACGTTGGATGAGAAATCCTCCTACCTTGGC ACGTTGGATGGCCCGGCCTCTACATAAAAT 281423ACGTTGGATGAACCTCAAGCTGCTTCACTG ACGTTGGATGGAGGAGCCCACCTTTAATGT 281424ACGTTGGATGACCTGTGTTTCTAGGTGTGC ACGTTGGATGCATGCCTGGGAAAAAACTCC 281426ACGTTGGATGATCCTCACACCTCAGTCTCC ACGTTGGATGAATGAGACTCCGTCTCTACC 281427ACGTTGGATGGACAATTGTAGTACCCAGCC ACGTTGGATGAGGAGAATCGCTTGAACCTG 281428ACGTTGGATGAGTAGCTGGAATTACAGGCG ACGTTGGATGGCCAACATGATGAAATCCCG 281431ACGTTGGATGACTGGGATTACAGGTGTGAG ACGTTGGATGGGAGAAATCTTGATGGAGGC 281432ACGTTGGATGAGCTGGGACTTTCCTTCTTG ACGTTGGATGCAGTAAATCCAGCCTTCAGC 281434ACGTTGGATGCCACGCCTGGCTAATTTTTG ACGTTGGATGGGTCAGGAGTTCAAGACCAG 281436ACGTTGGATGCATGGTTCACTGCAGTCTTG ACGTTGGATGTGTGGTGTTGTGAGCCTATG 281437ACGTTGGATGATAGGCTCACAACACCACAC ACGTTGGATGAACACAAAGGAAGTCTGGGC 281437ACGTTGGATGATAGGCTCACAACACCACAC ACGTTGGATGAACACAAAGGAAGTCTGGGC 281438ACGTTGGATGACCTGAGGTTTCCTCACTCAG ACGTTGGATGAGAGGTTTCTGTGACACCCG 281439ACGTTGGATGGCGGAGCCATACCTCTAAGC ACGTTGGATGTCGCTGGCACTTTCGTCCC 281440ACGTTGGATGCTGGCTGAGATGCCATGATA ACGTTGGATGATGGTGGGAGGAGCTAAATG 281440ACGTTGGATGGCCATGATAATAAGCTGGAC ACGTTGGATGTCTTAGTCCCCAAATGTATC 368835ACGTTGGATGGGTGGGAAAAAGACGTGAAG ACGTTGGATGAGAGGGAATTAAGGAGGTCC 378395ACGTTGGATGAATTCCGTGGGATGAGGAAT ACGTTGGATGACCGTGTTTTCCAGGCTCGCG 378395ACGTTGGATGACTTGGCCCCCTGCACTCACA ACGTTGGATGACCGTGTTTTCCAGGCTCGCG 430092ACGTTGGATGGTTGGGATTACAGGCATGAG ACGTTGGATGATCTGTTGCCTGTCAAGATG 473241ACGTTGGATGGCCATGATAATAAGCTGGAC ACGTTGGATGAAATGTATCCCCGCCCTAAG 547878ACGTTGGATGTACTCAGGAGGCTGAGGTG ACGTTGGATGCATGGTTCACTGCAGTCTTG 827786ACGTTGGATGGCGGAGCCATACCTCTAAGC ACGTTGGATGTCGCTGGCACTTTCGTCCC 827787ACGTTGGATGCTGGCTGAGATGCCATGATA ACGTTGGATGATGGTGGGAGGAGCTAAATG 885743ACGTTGGATGTGAGAGAAGGCGATCTTGAC ACGTTGGATGCCAATTCACAATCCACTGTG 885743ACGTTGGATGTGAGAGAAGGCGATCTTGAC ACGTTGGATGCCAATTCACAATCCACTGTG 892188ACGTTGGATGGTTTGTTTTTAGAGACAGGG ACGTTGGATGGTCAAAGCCACTTCCAGCTA 901886ACGTTGGATGCGATCTGGTCGCTCTGCAAG ACGTTGGATGGCCCCACCTTCTGTTCCAAG 923366ACGTTGGATGTCTGGGCAATGTTGCAAGAC ACGTTGGATGATAGGCTCACAACACCACAC 923366ACGTTGGATGTCTGGGCAATGTTGCAAGAC ACGTTGGATGATAGGCTCACAACACCACAC 1045384ACGTTGGATGGTGCAGAGATGGGCTTTCTC ACGTTGGATGAGATGGGCACAATGTCCGAC 1056538ACGTTGGATGACTGCCACAGCCACAGCTAG ACGTTGGATGTTTTCGCCCCCCAGGGTGA 1057981ACGTTGGATGGTACAACTGTACCTGGTGAC ACGTTGGATGAATGAACATAGGTCTCTGGC 1058154ACGTTGGATGTCCCTTCCATCCTCATTTTT ACGTTGGATGTGCAAGGCGCTAAACAAAAC 1059840ACGTTGGATGTCGGCCTGGCTCAGAAGAGG ACGTTGGATGACCCCTACCCCACGCTACCCA 1059849ACGTTGGATGGGAATGGATGCAGAAGCCCG ACGTTGGATGAAGCTGAGGCCACAGGGAG 1059849ACGTTGGATGAATGGATGCAGAAGCCCGTC ACGTTGGATGATTCCACGGAGGAAGCTGAG 1333881ACGTTGGATGATCAGCTCTACGCGATCTGG ACGTTGGATGTTCAGGCCCCACCTTCTGTTC 1799969ACGTTGGATGTCAACCTCTGGTCCCCCAGTG ACGTTGGATGAGGGGACCGTGGTCTGTTC 1799969ACGTTGGATGTTGCCATAGGTGACTGTGGG ACGTTGGATGTCCTAGAGGTGGACACGCAG 2075741ACGTTGGATGAAGATGCCAGTCCGTGGACC ACGTTGGATGCTGGAGACCCAGTGTCTCTC 2228615ACGTTGGATGGGGCAGATGGTGACAGTAAC ACGTTGGATGTGGAACTCCCTCCAGTGTGA 2228615ACGTTGGATGGGGCAGATGGTGACAGTAAC ACGTTGGATGTGGAACTCCCTCCAGTGTGA 2230399ACGTTGGATGAGCGGCAGTTACCATGTTAG ACGTTGGATGTTCTTCCCCCATTGCTTCTG 2230399ACGTTGGATGAGCGGCAGTTACCATGTTAG ACGTTGGATGTTCTTCCCCCATTGCTTCTG 2278442ACGTTGGATGGGTGATGGACATTGAGGGTG ACGTTGGATGTCCCTTCTGTCTCCAACCC 2278442ACGTTGGATGTCGTGGTGATGGACATTGAG ACGTTGGATGAAGTCAATATGCGTCCCTTC 2291473ACGTTGGATGAAGAGGCTATGTGGCAGATG ACGTTGGATGAGGGTGAAGCTGGGTTTAAC 2304237ACGTTGGATGTGGGCCAGAACTTCACCCTG ACGTTGGATGAAGCAGCACCACCGTGAGG 2304240ACGTTGGATGAATCTCAGCAACGTGACTGG ACGTTGGATGACACGGTGATGTTAGAGGAG 2304240ACGTTGGATGAATCTCAGCAACGTGACTGG ACGTTGGATGACACGGTGATGTTAGAGGAG 2358581ACGTTGGATGTAAGGCAGGAGGATGGAGTG ACGTTGGATGGACAGAGTCTCACTCTGTCG 2358583ACGTTGGATGAAGACGTGAAGAGACACACC ACGTTGGATGAGAGGGAATTAAGGAGGTCC 2569693ACGTTGGATGCTTGTTCTCGCGTGGATGTC ACGTTGGATGTACTCAGCGTGTGTGAGCTC 2569702ACGTTGGATGACCCTCCAGACCTTGAACCA ACGTTGGATGACGTAACGCTAACGGTGGAG 2569702ACGTTGGATGATACCCTACTCCTACTCTTC ACGTTGGATGTCAAGGACGTAACGCTAACG 2569703ACGTTGGATGTCAGGAAGCTCCCAGACAGA ACGTTGGATGATAACCCTTGGACGCCGATC 2569703ACGTTGGATGTTAGACGAAAAAGGCGCCAC ACGTTGGATGTTGTCCCTGCATAACCCTTG 2569707ACGTTGGATGTGAGCGTGGCAGGCGCCATG ACGTTGGATGGCGTGGCGCCCGTGCGCGT 2884487ACGTTGGATGTGTGGCAAATGATGGAACAG ACGTTGGATGCCAGAAGTTTGAGATCTGCC 2916060ACGTTGGATGGGCGAGGTATCTGAGAGGG ACGTTGGATGTACTCTGTCCCACTTCCGTC 3093029ACGTTGGATGGGCAGCTCTGATTGGATGTT ACGTTGGATGCTCCACAGTTGTTTGGCCTC 3093030ACGTTGGATGAGAGACCCAGAAGGTCATAG ACGTTGGATGCCTCCCCCAAGAAAACATTG 3093032ACGTTGGATGGGCCACTTCTTCTGTAAGTC ACGTTGGATGCATGAGGACATACAACTGGG 3093033ACGTTGGATGAAAGCCTGGAATAGGCACAC ACGTTGGATGTGCAGACAGTGACCATCTAC 3093035ACGTTGGATGGGAGACATAGCGAGATTCTG ACGTTGGATGTAGAAAGCAGTGCGATCTGG 3176764ACGTTGGATGAAATCGTTTGAACCCGGGAG ACGTTGGATGGTTTTGAGACAGAGTCTCAC 3176766ACGTTGGATGTTTCGGGCTGCAATGGTCCC ACGTTGGATGTAACACCTCTCTCCTTGTGC 3176767ACGTTGGATGCGGTCTCTGATGGATTCTAC ACGTTGGATGAACAGGCCCCACCATTTAAC 3176768ACGTTGGATGGAGAGGTGTTAAATGGTGGG ACGTTGGATGGGAACATGAAGAAGTCCTGG 3176769ACGTTGGATGTTCCTGTTTATGGCCAGACG ACGTTGGATGGTCTGAACCTGATTGGAGAG 3181049ACGTTGGATGATCTTCAGGGATGGTCACTC ACGTTGGATGGACAAATACAAAGGGACAGG 3745261ACGTTGGATGACACACAGCAGGGCATCCGT ACGTTGGATGCGCAATCAATGCTTTCCACC 3745263ACGTTGGATGTACATGAAGAAGGACTCGGC ACGTTGGATGATCCGTCCAGTGCACGTAGA 3745264ACGTTGGATGCAAAGTGCTAGGATCACAGG ACGTTGGATGACTGCCCCATAGAGTGGCAA FCH-0994ACGTTGGATGTTTTCGCCCCCCAGGGTGAC ACGTTGGATGACAGCCACAGCTAGCGCAGA

TABLE 11 dbSNP Extend Term rs# Primer Mix 5498 CAGAGCACATTCACGGTCACCTCGT 11115 AAGGGTGGGCGTGGGCCT ACT 11115 AAGGGTGGGCGTGGGCCT ACT 56901AAGGGTGGGCGTGGGCCT ACT 240914 ACAATGTCCGACTCCCACA ACT 254615CCAGGGTGACGTTGCAGA ACG 254615 TAAGGCAAAGTTCAGCTACTTA CGT 272539ACCCCGTACCACTGTTGA CGT 281412 GCTGGGATTATAAGCGTG ACT 281413GCTCACAGTTCTCGGCAGGAC ACG 281414 CCTTCCTCTGTCAGAATGGC ACG 281415GGTGATTTGGGGACAGCTGA ACT 281416 GGTCCACACCGACGCCAG ACT 281417CCCCTGCCCAGGACACCCC ACT 281418 TCAGCTTCCTCCCTCCCC ACT 281420ACTGTCTCTCTGTTTTTGAGAT ACT 281421 GCTTTGTCACCCAGGCTGGA ACT 281422CTGGGGAACTACAGGAATGC ACT 281423 GCCCACCCTCCATTCAGC ACG 281424TAGGTGTGCGTGTGTGTGTG ACG 281426 GAGCTGGGACCACAGGCA ACG 281427CTTTGTATACAATCTTCCCTC ACG 281428 GCGCCCAGCACCACGCC ACG 281431ACAGGTGTGAGCCACTGC ACT 281432 GGGAGTCATGGAGGGTTT ACT 281434TAGAGACGGGGTTTCACTAT ACT 281436 ACTGCAGTCTTGACCTTTTG ACT 281437TTTTTTTTCCAGAGACGGGGTCT ACG 281437 TTTTTCCAGAGACGGGGTCT ACG 281438CGAAGCCCCAGACTCTGTGTA ACT 281439 ACCCCTCCGGGTCAGCTCC ACT 281440TAATAAGCTGGACTCCGAGC ACG 281440 TAATAAGCTGGACTCCGAGC ACG 368835AGACGTGAAGAGACACACCT ACT 378395 GCCCGCGTCCTCCTCTCC ACT 378395GCCCGCGTCCTCCTCTCC ACT 430092 ATTACAGGCATGAGCCACTG ACG 473241ATAATAAGCTGGACTCCGAGC ACG 547878 GTGGGAGGATCACTTGAGC ACG 827786ACCCCTCCGGGTCAGCTCC ACT 827787 TAATAAGCTGGACTCCGAGC ACG 885743GACCCCTCTCTCCCTCCA CGT 885743 GACCCCTCTCTCCCTCCA CGT 892188TGGGCTGGAGCACAATGAC ACT 901886 GAGTCCGCAGCTCTTTGAAC ACT 923366TTGCAAGACCCCGTCTCTG ACT 923366 TTGCAAGACCCCGTCTCTG ACT 1045384CCAGTCCCCTGCTGTCTGT CGT 1056538 GAGGGTGCCAGGCAGCTG ACT 1057981TACCTGGTGACCTTGAATGTGAT ACG 1058154 CTTCCATCCTCATTTTTTTTTATT ACT 1059840GCTCAGAAGAGGTGCTTCAC CGT 1059849 CAGAAGCCCGTCTGGGCT ACG 1059849CAGAAGCCCGTCTGGGCT ACG 1333881 AGAGTCCGCAGCTCTTTGAAC ACT 1799969CCGAGACTGGGAACAGCC ACG 1799969 CCGAGACTGGGAACAGCC ACG 2075741GGACCATGGTGCACAGCA ACT 2228615 AGTAACCTGCGCAGCTGGG ACT 2228615GTAACCTGCGCAGCTGGG ACT 2230399 GTTACCATGTTAGGGAGGAGA ACT 2230399ACCATGTTAGGGAGGAGA ACT 2278442 GGACATTGAGGGTGAGCTAA ACG 2278442ACATTGAGGGTGAGCTAA ACG 2291473 GGAGTGTCCCTGGACCCC ACT 2304237TGCGCTGCCAAGTGGAGG ACT 2304240 GCTCAGTGTACTGCAATGGCTC ACG 2304240AGTGTACTGCAATGGCTC ACG 2358581 CTTGCAGTGAGCCCAGATCG CGT 2358583AAGAGACACACCTAATTTGTGG ACT 2569693 CGCGTGGATGTCAGGGCC ACG 2569702CAGACCTTGAACCAGATAGAA ACT 2569702 ACCTTGAACCAGATAGAA ACT 2569703CTCCCAGACAGAGTGCATG ACT 2569703 TCCCAGACAGAGTGCATG ACT 2569707GGCGAGTACGAGTGCGCA ACT 2884487 AGAGACAGGGTCTCGCC ACT 2916060CTCCCTCTCGGTCCCGG ACT 3093029 AGTTTCCTATCCCAGCC ACT 3093030CCAGAACCTCAGGGTATG 3093032 CTTCTGTAAGTCTGTGGG 3093033 GGGTTCAGGTCACACCCACG 3093035 TTCTGTCTCAAAAAACAAAGC ACT 3176764 CCCGCCACTGCACTCCA ACT3176766 TCCTTCTGAGTTCTCCC ACG 3176767 TGGATTCTACCTTTCCC CGT 3176768TGTTGATGCGTGGGTTGGGG ACT 3176769 CGGGGTGGGTGGATCAA ACT 3181049ACTCCCTGCCCTGGCCC ACT 3745261 GCAGCTGCACCGACAGTTC ACT 3745263TCGGCTGCCCGTGCCAAGTC ACT 3745264 ATACCATGCCAGGCATT ACT FCH-0994CCCAGGGTGACGTTGCAGA ACG

Genetic Analysis of Allelotyping Results

Allelotyping results are shown for cases and controls in Table 12. Theallele frequency for the A2 allele is noted in the fifth and sixthcolumns for breast cancer pools and control pools, respectively, where“AF” is allele frequency. The allele frequency for the A1 allele can beeasily calculated by subtracting the A2 allele frequency from 1 (A1AF=1−A2 AF). For example, the SNP rs2884487 has the following case andcontrol allele frequencies: case Al (T)=0.788; case A2 (C)=0.212;control Al (T)=0.758; and control A2 (C)=0.242, where the nucleotide isprovided in paranthesis. SNPs with blank allele frequencies wereuntyped.

TABLE 12 dbSNP Position in Chromosome A1/A2 A2 Case A2 Control rs# FIG.1 Position Allele AF AF p-Value 2884487  139 10204039 T/C 0.212 0.2420.2425 1059840 11799 10215699 A/T 0.809 0.805 0.8545 11115 1185110215751 T/C 0.434 0.379 0.0644 1059849 11963 10215863 G/A 0.243 0.1940.0468 3093035 24282 10228182 A/G 0.889 0.914 0.1592 ICAM_SNPA 2684910230749 A/T Not Allelotyped 281428 29633 10233533 C/T 0.180 0.1740.7908 281431 31254 10235154 T/C 0.107 0.109 0.8964 ICAM_SNPB 3196710235867 G/C 0.375 0.382 0.8113 2358581 32920 10236820 G/T 0.097 0.0740.1800 281434 33929 10237829 A/G 0.818 0.831 0.5765 ICAM_SNPC 3559910239499 G/C Not Allelotyped 1799969 36101 10240001 G/A 0.117 0.1510.1036 3093033 36340 10240240 G/A 0.004 0.023 0.0051 ICAM_SNPD 3640510240305 A/G Not Allelotyped ICAM_SNPE 36517 10240417 T/C NotAllelotyped ICAM_SNPF 36777 10240677 A/G Not Allelotyped 5498 3699210240892 G/A 0.554 0.487 0.0257 ICAM_SNPG 37645 10241545 T/C 0.684 0.7320.0788 1057981 37868 10241768 G/A 0.978 0.994 0.0289 281436 3844010242340 A/G 0.504 0.554 0.0977 923366 38532 10242432 T/C 0.597 0.5530.1471 281437 38547 10242447 C/T 0.195 0.151 0.0521 ICAM_SNPH 3871210242612 T/C 0.448 0.398 0.0970 281438 40684 10244584 T/G 0.235 0.2000.1589 3093029 40860 10244760 C/G 0.089 0.081 0.6267 2569693 4121310245113 C/T 0.297 0.355 0.0389 281439 41419 10245319 G/C 0.526 0.5890.0352 281440 41613 10245513 G/A 0.736 0.746 0.7085 ICAM_SNPI 4240710246307 C/G 0.325 0.394 0.0173 1333881 43440 10247340 T/C 0.336 0.3600.3961 1056538 44247 10248147 T/C 0.592 0.489 0.0009 2228615 4467710248577 A/G 0.595 0.519 0.0112 2569702 45256 10249156 T/C 0.294 0.3570.0254 2569703 45536 10249436 C/G 0.438 0.476 0.2109 ICAM_SNPJ 4615310250053 C/T Not Allelotyped 2569707 47546 10251446 C/G 0.829 0.8400.6238 2916060 47697 10251597 A/C 0.010 0.002 0.0702 885743 4794410251844 A/T Not Allelotyped ICAM_SNPK 48530 10252430 C/G NotAllelotyped 892188 51102 10255002 T/C 0.512 0.434 0.0104 2291473 5709010260990 T/C 0.087 0.090 0.8770 281416 60093 10263993 A/G 0.546 0.5050.1669 281417 60439 10264339 T/C 0.471 0.476 0.8531 281418 6269410266594 G/C 0.914 0.934 0.1968 430092 66260 10270160 C/T 0.229 0.2570.2758 368835 67295 10271195 A/G 0.703 0.727 0.3808 2358583 6730410271204 T/G 0.304 0.326 0.4322 ICAM_SNPL 67731 10271631 G/T 0.705 0.6690.2029 1045384 68555 10272455 C/A 0.180 0.187 0.7736 281427 7042910274329 C/T 0.217 0.176 0.0916 3745264 70875 10274775 T/G 0.853 0.8360.4285 281426 72360 10276260 G/A 0.565 0.685 0.0001 281424 7422810278128 C/T 0.246 0.250 0.8929 281423 76802 10280702 C/T 0.192 0.1970.8585 281422 77664 10281564 T/C 0.632 0.632 0.9791 281421 7880310282703 A/G 0.920 0.925 0.7863 281420 79263 10283163 A/G 0.392 0.4320.1774 3745263 80810 10284710 A/G 0.936 0.923 0.4005 3745261 8102010284920 T/C 0.006 0.008 0.5979 3181049 82426 10286326 T/C 0.650 0.6400.7183 281412 82783 10286683 T/C 0.408 0.352 0.0527 2230399 8591210289812 C/G 0.826 0.838 0.5900 2278442 86135 10290035 G/A 0.581 0.5940.6511 2304237 87877 10291777 T/C 0.102 0.093 0.6063 281413 8804310291943 G/A Not Allelotyped 1058154 88206 10292106 A/C 0.780 0.8100.2203 3176769 88343 10292243 T/C 0.199 0.214 0.5539 2304240 9070110294601 G/A 0.170 0.203 0.1661 3176768 90974 10294874 A/G 0.642 0.6500.7681 3176767 91060 10294960 C/A 0.727 0.725 0.9511 3176766 9108710294987 C/T 0.230 0.231 0.9513 ICAM_SNPM 91594 10295494 G/A 0.289 0.2670.4128 281415 92302 10296202 T/G 0.754 0.766 0.6399 3176764 9238410296284 A/G 0.899 0.894 0.8086 281412 NOT MAPPED 0.154 0.156 0.9342281413 NOT MAPPED 0.299 0.302 0.9195 281415 NOT MAPPED 0.664 0.6840.4825

FIG. 14 shows the proximal SNPs in and around the ICAM region forfemales. The position of each SNP on the chromosome is presented on thex-axis. The y-axis gives the negative logarithm (base 10) of the p-valuecomparing the estimated allele in the case group to that of the controlgroup. The minor allele frequency of the control group for each SNPdesignated by an X or other symbol on the graphs in FIG. 14 can bedetermined by consulting Table 12. By proceeding down the Table from topto bottom and across the graphs from left to right the allele frequencyassociated with each symbol shown can be determined.

To aid the interpretation, multiple lines have been added to the graph.The broken horizontal lines are drawn at two common significance levels,0.05 and 0.01. The vertical broken lines are drawn every 20 kb to assistin the interpretation of distances between SNPs. Two other lines aredrawn to expose linear trends in the association of SNPs to the disease.The light gray line (or generally bottom-most curve) is a nonlinearsmoother through the data points on the graph using a local polynomialregression method (W. S. Cleveland, E. Grosse and W. M. Shyu (1992)Local regression models. Chapter 8 of Statistical Models in S eds J. M.Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.). The black line (orgenerally top-most curve, e.g., see peak in left-most graph just to theleft of position 92150000) provides a local test for excess statisticalsignificance to identify regions of association. This was created by useof a 10 kb sliding window with 1 kb step sizes. Within each window, achi-square goodness of fit test was applied to compare the proportion ofSNPs that were significant at a test wise level of 0.01, to theproportion that would be expected by chance alone (0.05 for the methodsused here). Resulting p-values that were less than 10⁻⁸ were truncatedat that value.

Finally, the gene or genes present in the loci region of the proximalSNPs as annotated by Locus Link (http address:www.ncbi.nlm.nih.gov/LocusLink/) are provided on the graph. The exonsand introns of the genes in the covered region are plotted below eachgraph at the appropriate chromosomal positions. The gene boundary isindicated by the broken horizontal line. The exon positions are shown asthick, unbroken bars. An arrow is place at the 3′ end of each gene toshow the direction of transcription.

Additional Genotyping

In addition to the ICAM region incident SNP, two other SNPs weregenotyped in the discovery cohort. The discovery cohort is described inExample 1. The SNPs (rs1801714 and rs2228615) are located in the ICAM5encoding portion of the sequence, were associated with breast cancerwith a p-value of 0.0734 and 0.00236, respectively, and encodednon-synonymous amino acids (see Table 15).

The methods used to verify and genotype the two proximal SNPs of Table15 are the same methods described in Examples 1 and 2 herein. The PCRprimers and extend primers used in these assays are provided in Table 13and Table 14, respectively.

TABLE 13 dbSNP Second First rs# PCR primer PCR primer 1801714ACGTTGGATGAGGGTTGCAGA ACGTTGGATGAGCCAAGG GCAGGAGAA TGACGCTGAATG 2228615ACGTTGGATGAGATGGTGACA ACGTTGGATGTGGCATTTAG GTAACCTGC CTGAAGCTGG

TABLE 14 dbSNP Extend Term rs# Primer Mix 1801714CCTTCAGCAGGAGCTGGGCCCTC ACT 2228615 TAACCTGCGCAGCTGGG ACT

Table 15, below, shows the case and control allele frequencies alongwith the p-values for the SNPs genotyped. The disease associated alleleof column 4 is in bold and the disease associated amino acid of column 5is also in bold. The chromosome positions provided correspond to NCBI'sBuild 33.

TABLE 15 Genotpying Results Amino dbSNP Position in Chromosome AllelesAcid AF AF Odds rs# FIG. 1 Position (A1/A2) Change F case F controlp-value Ratio 1801714 36517 10240417 T/C L352P T = 0.010 T = 0.0300.0734 2.260 C = 0.990 C = 0.097 2228615 44677 10248577 A/G T348A A =0.340 A = 0.430 0.00236 1.470 G = 0.660 G = 0.570

Example 5 MAPK10 Proximal SNPs

It has been discovered that a polymorphic variation (rs1541998) in aregion that encodes MAPK10 is associated with the occurrence of breastcancer (see Examples 1 and 2). Subsequently, SNPs proximal to theincident SNP (rs1541998) were identified and allelotyped in breastcancer sample sets and control sample sets as described in Examples 1and 2. Approximately sixty-three allelic variants located within theMAPK10 region were identified and allelotyped. The polymorphic variantsare set forth in Table 16. The chromosome position provided in columnfour of Table 16 is based on Genome “Build 33” of NCBI's GenBank.

TABLE 16 dbSNP Position in Chromosome Allele rs# Chromosome FIG. 2Position Variants 2575681 4 191 87306691 C/T 2575680 4 1490 87307990 A/G2589505 4 3781 87310281 C/T 2589504 4 3935 87310435 G/A 2164538 4 451287311012 T/C 2575679 4 7573 87314073 A/G MAP_SNP1 4 8467 87314967 A/T2869408 4 9001 87315501 C/G 934648 4 9732 87316232 T/C 2164537 4 1347787319977 T/C 2575678 4 13787 87320287 A/C 2575677 4 13903 87320403 G/C2589509 4 14355 87320855 T/G 2164536 4 15053 87321553 A/C 2164535 415459 87321959 T/A MAP_SNP2 4 17762 87324262 G/A 2589523 4 1948287325982 C/T 3755970 4 19631 87326131 A/C 2575675 4 22170 87328670 G/A1202 4 22688 87329188 T/C 1201 4 22748 87329248 A/G 2589516 4 2337687329876 G/T 2575674 4 23826 87330326 A/T 2589515 4 23868 87330368 G/CMAP_SNP3 4 24154 87330654 C/T 2589506 4 25972 87332472 G/A 1436524 426057 87332557 A/G 2575672 4 26361 87332861 C/T 2589518 4 26599 87333099G/A 3775164 4 26712 87333212 T/G 2589514 4 26812 87333312 G/A 3775166 427069 87333569 T/C 3775167 4 32421 87338921 C/T 3775169 4 33557 87340057T/C 2043650 4 35127 87341627 A/G 2043649 4 35222 87341722 T/G 3775170 435999 87342499 T/A 1541998 4 36424 87342924 C/T 2043648 4 37403 87343903A/G 2282598 4 39203 87345703 C/T 2282597 4 39226 87345726 G/A 3775173 441147 87347647 T/C 1469870 4 46176 87352676 G/C 1436522 4 50452 87356952T/C 1946733 4 52919 87359419 G/A 1436525 4 60214 87366714 G/A 3822037 461093 87367593 C/G 3775176 4 62572 87369072 G/A 1436527 4 63601 87370101C/T 1436529 4 65362 87371862 T/C 3775182 4 65863 87372363 T/G 3775183 466207 87372707 G/A 3775184 4 66339 87372839 A/G 3775187 4 69512 87376012T/C 1010778 4 70759 87377259 A/G 2282596 4 71217 87377717 T/A 2118044 473382 87379882 A/T 1469869 4 76307 87382807 C/T 1046706 4 Not mapped G/T2060588 4 Not mapped G/A 2289490 4 Not mapped C/T 2289491 4 Not mappedC/T 729511 4 Not mapped T/C

Assay for Verifying and Allelotyping SNPs

The methods used to verify and allelotype the proximal SNPs of Table 16are the same methods described in Examples 1 and 2 herein. The PCRprimers and extend primers used in these assays are provided in Table 17and Table 18, respectively.

TABLE 17 dbSNP Forward Reverse rs# PCR primer PCR primer 958ACGTTGGATGATCCGCATGTGTCTGTATTC ACGTTGGATGCCCAGTGCATTATGTCTTGG 1201ACGTTGGATGTGCCAGTGCTCTGAAAACTG ACGTTGGATGCCTGTGGTCTCTATTGCTTG 1201ACGTTGGATGACAAGAATGCCAGTGCTCTG ACGTTGGATGCCTGTGGTCTCTATTGCTTG 1202ACGTTGGATGTAATCTCAGAATGGCAGCAC ACGTTGGATGTCAAGCAATAGAGACCACAG 10305ACGTTGGATGTTCAAGAATTATTTTATTGCAA ACGTTGGATGGGTGAAGCTTGAAAGCAAGC GTC729511 ACGTTGGATGTTAATGTAGTAAAAAGCACG ACGTTGGATGCTAGAGATCGGTTTTACACC934648 ACGTTGGATGACTGGTTGATACCATAGGAC ACGTTGGATGTGTACTGCTTTCATCCTTGC934648 ACGTTGGATGACTGGTTGATACCATAGGAC ACGTTGGATGTGTACTGCTTTCATCCTTGC1010778 ACGTTGGATGCAGAGGAAAGAAAACTGAAAG ACGTTGGATGGGATTTGTTCTTAATCTTTC1046706 ACGTTGGATGCAAATGGGAGTCAAGTCCTC ACGTTGGATGTTTTGCTCCTAAGCTGAAGG1436522 ACGTTGGATGGGAATTGAAATTGGCATTGC ACGTTGGATGATTGGAAGGAGGAAGCATAG1436524 ACGTTGGATGGAGTTGCCAGTAGCTTTGAG ACGTTGGATGATTGTTTCCAGGGTGCTCTG1436525 ACGTTGGATGGTGCAATCTTGGTTCACTGC ACGTTGGATGGCTTACACTAGCTACTTGGG1436527 ACGTTGGATGAGCACTGTGAGTTAAACCTG ACGTTGGATGCTGTATAGAGAGCTGTTTGC1436529 ACGTTGGATGCTATGGCAGCAGAAGAGTAG ACGTTGGATGAATGTTGGACCACATGTACG1469869 ACGTTGGATGCATGGCGAGGAAATCTGTTT ACGTTGGATGTTCGATATATCAGAGCCTTG1469870 ACGTTGGATGATACTGAGCTCCATTTTGGG ACGTTGGATGATGGCACAGTTTAGCATGTC1541998 ACGTTGGATGGCCCATGTTAACATTTTCTTC ACGTTGGATGCTGATTATTCTGATGGTAATG1946733 ACGTTGGATGGCAGGAGGATAGATCTGTAG ACGTTGGATGTAGCTTCTAAACATCTCTTG2043648 ACGTTGGATGTGGCTTTCTGAATGCTAGAG ACGTTGGATGAGGGCGGAATGATTTTTAGC2043649 ACGTTGGATGGCACTACATGGGACACAAAG ACGTTGGATGGTCCTACTAGTCCCTGTATG2043650 ACGTTGGATGGCTGAGGGAGAAATTGAGTG ACGTTGGATGCTGTGCCTTGCACATAGTAG2060588 ACGTTGGATGTTTCATTGCTCATGGATTAG ACGTTGGATGGATAAGTATTGGCTTAATCTG2118044 ACGTTGGATGAACAACTTGGCTAATTCTAC ACGTTGGATGGTCATTGCCTCTAGCTAGTG2164535 ACGTTGGATGACCAGCACTATTACCCATGC ACGTTGGATGGAATGATGTAAACGTTGGAG2164536 ACGTTGGATGGTGATGAAAACCATGTGAGC ACGTTGGATGCTGGAGAACAAAAGACCACC2164537 ACGTTGGATGCAAGGCAAAATGTTTCCAGC ACGTTGGATGAACACACTTAGTACCCACGC2164538 ACGTTGGATGTACTGCAGAGCTCTCCCTTG ACGTTGGATGAGAGGTCATCTTAATGGGCC2282596 ACGTTGGATGTCATACTGATCAACCTGAAG ACGTTGGATGGGTGGCTTTGTGAAACCTTG2282597 ACGTTGGATGGCATGGTTCTGTTATAAGGC ACGTTGGATGACACTTGATTACAATGGCCC2282598 ACGTTGGATGCACGCCTAAGCAATTAATGAC ACGTTGGATGGTGAATGAAGGAAAAGTAGC2289490 ACGTTGGATGTGATTACTGGATTGGCTGGG ACGTTGGATGAAATGCCCTGAAGACCCAGC2289491 ACGTTGGATGGGAATGCATTGTAAACCAGG ACGTTGGATGACCTAGCCTTGCAGGAGGAC2575672 ACGTTGGATGATAGTGTTATCACATAGACC ACGTTGGATGCTCCAGGAGCAAGGATTATG2575674 ACGTTGGATGGTGGGTAACAGTTTTCAGGC ACGTTGGATGCTCTCCTACTCTTTACTGTC2575675 ACGTTGGATGTCGTACCTGCATAAGTGGTG ACGTTGGATGTTGGGAAGGTACTAACAGCG2575677 ACGTTGGATGGATGCCAATTTGGTTTGCCC ACGTTGGATGGAAGGATAAGCCACAGTGAG2575678 ACGTTGGATGCTTCAAGAGGCCATACAGAC ACGTTGGATGAAGCACCATTTGTGGCTCAG2575679 ACGTTGGATGCTTTCCTGCTGCATTTAGTG ACGTTGGATGTAAGCCAGTAACACATGCCG2575680 ACGTTGGATGGCCCTGAAGTTTTTGAATGG ACGTTGGATGGAGCCCAATACAATCAGGTG2575681 ACGTTGGATGTTCACTGCTAACATGCATGG ACGTTGGATGTTATATAGCCTTCTTTTCTC2589504 ACGTTGGATGGGATAGGAAACATATTAAGG ACGTTGGATGCTGTGTGATTTGGACAACCC2589505 ACGTTGGATGAGACTGTAGCCTAAATGAGG ACGTTGGATGCATTTTATGAGAAGATGCAC2589506 ACGTTGGATGGCAACTCAGCTAGCCTTTAC ACGTTGGATGTGTTATGCGGGAGTATAAGG2589509 ACGTTGGATGTGAATCATGGTTGCCTCCTG ACGTTGGATGATACGCAGGTTGTAGAGAGG2589514 ACGTTGGATGTATACATTGTCCTGATAGAG ACGTTGGATGCTTAAATGTCTCTAGAAAAGG2589515 ACGTTGGATGCACCTGTATACCAATTTGTAG ACGTTGGATGGCCAAACCATTTTGTGCCTG2589516 ACGTTGGATGCATACTCTGCCAAAGTTTTA ACGTTGGATGACTCACACTGTGGTTTGGGG2589518 ACGTTGGATGCCAGGCAAAAAGAATGACCG ACGTTGGATGAATGATATGCACCGATCTTC2589523 ACGTTGGATGTCATGTAGCTAAACAAAGGC ACGTTGGATGAGCAGGGTTAAATTTCCCAG2589525 ACGTTGGATGAAGAACATTGAAAGAAGCAG ACGTTGGATGGTATTTAAATTAGTGGTGTG2869408 ACGTTGGATGTCCCAGTACCTAAGTAGCAG ACGTTGGATGGCTTTGAATTACTCTGTCCC3755970 ACGTTGGATGTACAACTAGTATCTACAGAC ACGTTGGATGGTGACCATGTAGAAATCTGTG3775164 ACGTTGGATGGAACATGAAAAATTCATAAGC ACGTTGGATGAAGTTTCCCTGGTCGTGATC3775166 ACGTTGGATGCTGTTTTTCACCCCCGATTC ACGTTGGATGCTGAGGAGTCCATCATAGTG3775167 ACGTTGGATGGAAACAAGCAGATGTCATGG ACGTTGGATGGCTTCTGATTTTATATGGCAC3775169 ACGTTGGATGGGGAGAGAATGGTTGCATAT ACGTTGGATGATGCTGAACAACAGGATGGG3775170 ACGTTGGATGCCTAAGACCTATGCTCTCAC ACGTTGGATGCCCATTTTTGCTAGCAGGAG3775173 ACGTTGGATGCAAGAGGGCTGCTTTAAACC ACGTTGGATGTAAATTTGCAGAGGCCGTCG3775176 ACGTTGGATGAAAAGGTCACCAGTGACCTG ACGTTGGATGTAGTCCAAGTATTTCCCAAG3775182 ACGTTGGATGGATATCTCCCTCCTATTGGC ACGTTGGATGGCTGGACTCTATTAGGCCAT3775183 ACGTTGGATGGATCTCTGATCTTAGACCAC ACGTTGGATGTGCAGATATGTAGGCCAAGC3775184 ACGTTGGATGGACCAGCAACCATGATGAAG ACGTTGGATGGTTCTACTTTGACCACAGGC3775187 ACGTTGGATGTAGCACCTTCAGGATCTTTC ACGTTGGATGAATCATGATCCCAGGGCAAG3822037 ACGTTGGATGGTAATCCATAAACTGTGGGAG ACGTTGGATGTCCCACCCTGACTTCTTTGC

TABLE 18 dbSNP Extend Term rs# Primer Mix 958 TTATGTCTTGGTAGAGCC ACG1201 TCTATTGCTTGAAGAGAGAAAG ACT 1201 TTGCTTGAAGAGAGAAAG ACT 1202CCACCTGCACCATCGCCAT ACT 10305 AGCTAAATTGCAACAACA ACG 729511ATTGAACTGTATACTTAAAAATGC ACT 934648 ACTCTCCCACTGAGCAAGC ACT 934648ACTCTCCCACTGAGCAAGC ACT 1010778 TTGAAATACTGTTTGTTTCCCCAA ACT 1046706TCCTAAGCTGAAGGGAATGC CGT 1436522 GAGGAAGCATAGATTTGGTGT ACT 1436524CCAGGGTGCTCTGGTTTAATT ACT 1436525 GGCTTAAACCTGGGAGG ACG 1436527GAGCTGTTTGCATTTATAACTCA ACG 1436529 ACCACATGTACGTAAGGGGA ACT 1469869AAACACCATCTACTCTGAAGAA ACG 1469870 CTTATATTCTCTGTGGCACCAA ACT 1541998ATTATTCTGATGGTAATGATCCAG ACG 1946733 CTAAACATCTCTTGAATATTCTG ACG 2043648TGATTTTTAGCTAAAGGGGACA ACT 2043649 CCTCTTGTCTTATTATCCC ACT 2043650GCACATAGTAGTAGCTCA ACT 2060588 ATTGGCTTAATCTGTACATCAATT ACG 2118044GTGGGGTTAGATATTATTTCCTGA CGT 2164535 GATAAATGTGAGATTGAGAGA CGT 2164536CCTGTGTTCCTTTGTATTTATAT ACT 2164537 CGGCTTCTACTCTCTTATTCA ACT 2164538GTCACATTCTTACCCTC ACT 2282596 GAAACCTTGCATGAACT CGT 2282597CAGAAGCTACTTTTCCTTCA ACG 2282598 AGGAAAAGTAGCTTCTGGG ACG 2289490GCTAGACTCCTGATACC ACG 2289491 GGCTTGCTCCTGGTAATTTA ACG 2575672CAAGGATTATGTTAACCACT ACG 2575674 TATTCACACCTGCCTTC CGT 2575675GTTCTTGCCTGGTTTAC ACG 2575677 GGAATGAGGGCAACAGGA ACT 2575678TGTGGCTCAGGTCCAGG ACT 2575679 CTTCCTGGACATTAAATTGT ACT 2575680GGATGCATGGTTTCTCTAAT ACT 2575681 TTCTTTTCTCTTTTAGGAATCT ACG 2589504GTGCTAGGATCCTCAGT ACG 2589505 GTTTTAGCATAATTGCTTCTTTA ACG 2589506GAGAAGAAACCTGCCCA ACG 2589509 AGGGCTGCAGGGAAGAT ACT 2589514AGAAAAGGTTTTTAAAGTCCTC ACG 2589515 GAAAACTGTTACCCACTC ACT 2589516GGTTTGGGGGTTTCATT CGT 2589518 TGCACCGATCTTCAAATAAA ACG 2589523TTTCCCAGATTAATTATCAGATT ACG 2589525 TTAGTGGTGTGACTTGCA ACG 2869408CGAATCTCTTTAACTGCTG ACT 3755970 GGTTTCTTCTAAAACTGACCT ACT 3775164TTTTTTGGGATCTTGATATTTTTA ACT 3775166 AACTTATGAAAGAATATGAAGGAT ACT3775167 TAAGAGAAGTCTTCAGTGCTT ACG 3775169 GCAGAGATTTTTCAAAATCTCTAA ACT3775170 TTTTTAAAGCTGAAAATAAACCA CGT 3775173 GCCGTCGAACAAATACT ACT3775176 TATTTCCCAAGTGCCCA ACG 3775182 CTGTCAGTTGCCTTAGG ACT 3775183AGTCAAGACCAGCTGGG ACG 3775184 CTCTTTCTTCTGATCCC ACT 3775187AGTGCATTACAGTGGTC ACT 3822037 TTTGCTTATTTCATAGAAGGAAT ACT

Genetic Analysis of Allelotyping Results

Allelotyping results are shown for cases and controls in Table 19. Theallele frequency for the A2 allele is noted in the fifth and sixthcolumns for breast cancer pools and control pools, respectively, where“AF” is allele frequency. The allele frequency for the A1 allele can beeasily calculated by subtracting the A2 allele frequency from 1 (A1AF=1−A2 AF). For example, the SNP rs2575681 has the following case andcontrol allele frequencies: case A1 (C)=0.611; case A2 (T)=0.389;control A1 (C)=0.632; and control A2 (T)=0.368, where the nucleotide isprovided in parenthesis. SNPs with blank allele frequencies wereuntyped.

TABLE 19 dbSNP Position in Chromosome A1/A2 A2 Case A2 Control rs# FIG.2 Position Allele AF AF p-Value 2575681 191 87306691 C/T 0.389 0.3680.483 2575680 1490 87307990 A/G 0.599 0.585 0.646 2589505 3781 87310281C/T 0.484 0.493 0.753 2589504 3935 87310435 G/A 0.258 0.274 0.5632164538 4512 87311012 T/C 0.403 0.412 0.784 2575679 7573 87314073 A/G0.020 0.003 0.006 MAP_SNP1 8467 87314967 A/T 0.704 0.682 0.441 28694089001 87315501 C/G 0.708 0.716 0.777  934648 9732 87316232 T/C 0.6550.664 0.741 2164537 13477 87319977 T/C 0.262 0.306 0.109 2575678 1378787320287 A/C 0.110 0.078 0.065 2575677 13903 87320403 G/C 0.920 0.9910.000 2589509 14355 87320855 T/G 0.198 0.209 0.668 2164536 1505387321553 A/C 0.623 0.605 0.534 2164535 15459 87321959 T/A 0.573 0.5710.944 MAP_SNP2 17762 87324262 G/A 0.389 0.401 0.693 2589523 1948287325982 C/T 0.779 0.813 0.156 3755970 19631 87326131 A/C 0.118 0.1070.563 2575675 22170 87328670 G/A 0.656 0.694 0.176   1202 22688 87329188T/C 0.764 0.762 0.933   1201 22748 87329248 A/G 0.128 0.117 0.5792589516 23376 87329876 G/T 0.427 0.478 0.086 2575674 23826 87330326 A/T0.583 0.666 0.004 2589515 23868 87330368 G/C 0.413 0.461 0.106 MAP_SNP324154 87330654 C/T 0.175 0.158 0.430 2589506 25972 87332472 G/A 0.4350.491 0.063 1436524 26057 87332557 A/G 0.660 0.756 0.001 2575672 2636187332861 C/T 0.274 0.185 0.001 2589518 26599 87333099 G/A 0.194 0.1300.004 3775164 26712 87333212 T/G 0.073 0.080 0.644 2589514 2681287333312 G/A 0.445 0.358 0.004 3775166 27069 87333569 T/C 0.249 0.1670.001 3775167 32421 87338921 C/T 0.156 0.152 0.882 3775169 3355787340057 T/C 0.169 0.130 0.067 2043650 35127 87341627 A/G 0.697 0.7870.001 2043649 35222 87341722 T/G 0.698 0.763 0.016 3775170 3599987342499 T/A 0.207 0.220 0.596 1541998 36424 87342924 C/T 0.715 0.7720.029 2043648 37403 87343903 A/G 0.424 0.466 0.159 2282598 3920387345703 C/T 0.022 0.031 0.324 2282597 39226 87345726 G/A 0.817 0.8020.541 3775173 41147 87347647 T/C 0.158 0.148 0.645 1469870 4617687352676 G/C 0.118 0.063 0.002 1436522 50452 87356952 T/C 0.165 0.1200.036 1946733 52919 87359419 G/A 0.240 0.226 0.588 1436525 6021487366714 G/A 0.054 0.039 0.212 3822037 61093 87367593 C/G 0.956 0.9180.010 3775176 62572 87369072 G/A 0.969 0.909 0.000 1436527 6360187370101 C/T 0.288 0.251 0.175 1436529 65362 87371862 T/C 0.555 0.5340.481 3775182 65863 87372363 T/G 0.858 0.870 0.568 3775183 6620787372707 G/A 0.565 0.617 0.080 3775184 66339 87372839 A/G 0.174 0.1850.634 3775187 69512 87376012 T/C 0.307 0.291 0.575 1010778 7075987377259 A/G 0.330 0.275 0.048 2282596 71217 87377717 T/A 0.735 0.7380.892 2118044 73382 87379882 A/T 0.352 0.319 0.248 1469869 7630787382807 C/T 0.388 0.335 0.069 1046706 Not mapped G/T 0.538 0.533 0.8662060588 Not mapped G/A 0.188 0.135 0.016 2289490 Not mapped C/T 0.7800.812 0.187 2289491 Not mapped C/T 0.960 0.971 0.297  729511 Not mappedT/C 0.864 0.866 0.914

FIG. 15 shows the proximal SNPs in and around the MAPK10 region forfemales. The position of each SNP on the chromosome is presented on thex-axis. The y-axis gives the negative logarithm (base 10) of the p-valuecomparing the estimated allele in the case group to that of the controlgroup. The minor allele frequency of the control group for each SNPdesignated by an X or other symbol on the graphs in FIG. 15 can bedetermined by consulting Table 19. By proceeding down the Table from topto bottom and across the graphs from left to right the allele frequencyassociated with each symbol shown can be determined.

To aid the interpretation, multiple lines have been added to the graph.The broken horizontal lines are drawn at two common significance levels,0.05 and 0.01. The vertical broken lines are drawn every 20 kb to assistin the interpretation of distances between SNPs. Two other lines aredrawn to expose linear trends in the association of SNPs to the disease.The light gray line (or generally bottom-most curve) is a nonlinearsmoother through the data points on the graph using a local polynomialregression method (W. S. Cleveland, E. Grosse and W. M. Shyu (1992)Local regression models. Chapter 8 of Statistical Models in S eds J. M.Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.). The black line (orgenerally top-most curve, e.g., see peak in left-most graph just to theleft of position 92150000) provides a local test for excess statisticalsignificance to identify regions of association. This was created by useof a 10 kb sliding window with 1 kb step sizes. Within each window, achi-square goodness of fit test was applied to compare the proportion ofSNPs that were significant at a test wise level of 0.01, to theproportion that would be expected by chance alone (0.05 for the methodsused here). Resulting p-values that were less than 10⁻⁸ were truncatedat that value.

Finally, the gene or genes present in the loci region of the proximalSNPs as annotated by Locus Link (world wide web address:ncbi.nlm.nih.gov/LocusLink/) are provided on the graph. The exons andintrons of the genes in the covered region are plotted below each graphat the appropriate chromosomal positions. The gene boundary is indicatedby the broken horizontal line. The exon positions are shown as thick,unbroken bars. An arrow is place at the 3′ end of each gene to show thedirection of transcription.

Example 6 KIAA0861 Proximal SNPs

It has been discovered that a polymorphic variation (rs2001449) in agene encoding KIAA0861 is associated with the occurrence of breastcancer (see Examples 1 and 2). Subsequently, SNPs proximal to theincident SNP (rs2001449) were identified and allelotyped in breastcancer sample sets and control sample sets as described in Examples 1and 2. A total of sixty-three allelic variants located within or nearbythe KIAA0861 gene were identified and fifty-seven allelic variants wereallelotyped. The polymorphic variants are set forth in Table 20. Thechromosome position provided in column four of Table 20 is based onGenome “Build 33” of NCBI's GenBank.

TABLE 20 dbSNP Position in Chromosome Allele rs# Chromosome FIG. 3Position Variants 3811729 3 107 184282507 A/G 693208 3 2157 184284557C/G 488277 3 7300 184289700 T/C 645039 3 8233 184290633 T/C 670232 39647 184292047 A/T 575326 3 9868 184292268 T/C 575386 3 9889 184292289C/G 471365 3 10621 184293021 G/C 496251 3 11003 184293403 G/A 831246 311507 184293907 T/C 831247 3 11527 184293927 G/C 831249 3 11718184294118 C/T 831250 3 11808 184294208 T/C 831252 3 12024 184294424 T/C512071 3 13963 184296363 C/T 1502761 3 14300 184296700 A/C 681516 314361 184296761 C/T 619424 3 16287 184298687 T/G 529055 3 18635184301035 A/G 664010 3 19365 184301765 T/G 2653845 3 24953 184307353 G/A472795 3 25435 184307835 G/A 507079 3 26847 184309247 G/A 534333 3 27492184309892 T/C 831242 3 27620 184310020 T/C 536111 3 27678 184310078 C/T536213 3 27714 184310114 G/A 831245 3 29719 184312119 A/G 639690 3 30234184312634 T/C 684174 3 31909 184314309 T/C 571761 3 32153 184314553 C/G1983421 3 33572 184315972 T/C 2314415 3 42164 184324564 T/G 2103062 343925 184326325 A/G 6804951 3 45031 184327431 C/T 1403452 3 45655184328055 T/C 903950 3 48350 184330750 C/A 2017340 3 48418 184330818 A/G2001449 3 48563 184330963 G/C 3821522 3 53189 184335589 A/G 1390831 356468 184338868 T/G 1353566 3 59358 184341758 C/A 1813856 3 63761184346161 C/T 2272115 3 65931 184348331 G/A 3732603 3 67040 184349440G/C 940055 3 69491 184351891 A/C 2314730 3 83308 184365708 A/GKIAA0861_3732602 3 126545 184408945 C/T KIAA0861_2293203 3 137592184419992 A/T 7639705 3 147169 184429569 G/T

Assay for Verifying and Allelotyping SNPs

The methods used to verify and allelotype the sixty-three proximal SNPsof Table 20 are the same methods described in Examples 1 and 2 herein.The PCR primers and extend primers used in these assays are provided inTable 21 and Table 22, respectively.

TABLE 21 dbSNP Forward Reverse rs# PCR primer PCR primer 471365ACGTTGGATGTGAGTGACATTTGTGTCACC ACGTTGGATGCGGAGGATCTGAACAACTTC 472795ACGTTGGATGTCACCTGAGCATCAGACATG ACGTTGGATGATAGTGGAAGGAGAAACGGG 484315ACGTTGGATGGTTCTAATGTCACCCCTTCC ACGTTGGATGCAATGTGGCAAATTCTCTGG 488277ACGTTGGATGCACACATTCTTCTCAAGTGC ACGTTGGATGGGAGGGACACAATTTAACTC 496251ACGTTGGATGGGGAGTCATTCCAATACCAG ACGTTGGATGGGAGTGAAAGGTCATATTGG 502289ACGTTGGATGATCACTGCAACCTCCACCTC ACGTTGGATGTGTGGCATGAGCCTGTAATC 507079ACGTTGGATGAAGCCTCAGATGAGGCATAC ACGTTGGATGTCTGAAAGGGTTCAGGAAGG 512071ACGTTGGATGCAAATCACCCCTGACAATTC ACGTTGGATGACCAGCACACTCAGCTTTAG 519088ACGTTGGATGTCACCTGAGGTCAGGAGTTG ACGTTGGATGAGGTTTCACCATGTTAGCCG 529055ACGTTGGATGCTGCAGTTATCTGGGTGAGC ACGTTGGATGCCAGAACGTGGCTTGTTGGG 534333ACGTTGGATGCGTTGATGCACTGAAGGGAG ACGTTGGATGAGAGGCTAAATGTTGGCAGG 536111ACGTTGGATGTGTATCTGATCCCAGGTCAC ACGTTGGATGATTGGTGTTAAGTGGCGTGC 536213ACGTTGGATGTGAGGACCTCATTATTGGTG ACGTTGGATGCTGAGCAATCGAACTGCTAC 571761ACGTTGGATGAATATCCTAGGCTAGCAGTG ACGTTGGATGGTGCATAAATACATGAATAG 575326ACGTTGGATGACAGAGAGGCTTGGTCATAC ACGTTGGATGGGTGCTTGGTTGTGATTCTC 575386ACGTTGGATGATTCCTGCAGGTACTGTGTC ACGTTGGATGTGAGCCCAAAACTACTGCTG 578886ACGTTGGATGATGAAGTCTCGCTCTGTTGC ACGTTGGATGAATCACTTGAACCCAGGAGG 602646ACGTTGGATGTCTGGGACCGTTTACCGCA ACGTTGGATGGAGGAGACCCAGGGTATGAG 619424ACGTTGGATGACCGGGAGCTCCCAGTCTG ACGTTGGATGTGGGAATCGGTTGAGAGCCG 620722ACGTTGGATGTAAGGCGCCTGCAGAGGCGA ACGTTGGATGGCAGCAAAGAATTGCCCGGC 631755ACGTTGGATGATTTGTAGCTTTGCCCCAGC ACGTTGGATGTTTGTGAGCTCCAAGTTGGG 639690ACGTTGGATGGCATTTTACCACCATGTGGTT ACGTTGGATGCCTTCATGTTAATTCTGCCC 645039ACGTTGGATGCCTCTGAGTTCCCTCAGTTT ACGTTGGATGTTATCACCCTGCTGTCCTAC 664010ACGTTGGATGTGGTACCTCCAGGTAAAATG ACGTTGGATGTCCAGGCAGTCATTTTACCC 670232ACGTTGGATGGAAGGTGGAGCAGACATTAG ACGTTGGATGACCTTAGTTATACCAGGCAC 678454ACGTTGGATGTTAAGCCAGTCCCCACAAGG ACGTTGGATGTTCTCTGCGGAGGAAAGTGC 681516ACGTTGGATGCTCCTCCTCAGAGGACTAAC ACGTTGGATGAGCCCAAGGACTCATACAAC 683302ACGTTGGATGACCACGCCTGGCTAATTTTG ACGTTGGATGAAACATGGCGAAACCCGGTC 684174ACGTTGGATGCTTTACTGAGTGGGCAAACG ACGTTGGATGTCTAAGTGGAACTCAGCAGC 684846ACGTTGGATGAAGTTCCTCTGGTGGACAAC ACGTTGGATGACCACCAGATAAAATCCCTC 693208ACGTTGGATGTTTTGACAGGGCTTGAGTCC ACGTTGGATGGCTGAAAGCCCTCAATCTAG 831242ACGTTGGATGCAATTGCTCAGACCTTCACC ACGTTGGATGAATGCTAGAGACATTGCACC 831245ACGTTGGATGCTAGAATTACAGGTGCACAC ACGTTGGATGGCCAAGATGGTGAAACCTTG 831246ACGTTGGATGCACAATCTGTTAGAATGGTGG ACGTTGGATGCGTCAAGACTGAATGCATAG 831247ACGTTGGATGGAAAATATAGTCCTACACAA ACGTTGGATGCGTCAAGACTGAATGCATAG 831249ACGTTGGATGTCTCCTAATGCTATCCCTCC ACGTTGGATGAACACATGGACACAGGAAGG 831250ACGTTGGATGAGGGACATGGATGAAATTGG ACGTTGGATGAATTCCCACCTATGAGTGAG 831252ACGTTGGATGTGGGTATATACCCAAAGGAC ACGTTGGATGGGTTGGTTCCAAGTCTTTGC 903950ACGTTGGATGCTTCAGTTCAGGGAGAGATC ACGTTGGATGATAGGGCCCCCAGCATAAAA 940054ACGTTGGATGTGGTAGAGATGAGGTCTTGC ACGTTGGATGAAAGGCAGGAGGATTGCTTG 940055ACGTTGGATGTATGCTTCCAGTCTCTGACC ACGTTGGATGATAGGTAATCCAGTTGGGCC 1353566ACGTTGGATGGGTGTACTCTGCCATTTGTC ACGTTGGATGTGGAGGAGGTTCTAGTACCC 1390831ACGTTGGATGGTCTGCCAAAGTTCCCTTAG ACGTTGGATGAGGAAAGGGAAGAGAAACCG 1403452ACGTTGGATGCAGAAGTTAGGATGCAGATG ACGTTGGATGCCAGTAGAGATAGAATTTTGG 1502761ACGTTGGATGCAGAAATATGAAGGTGGCCC ACGTTGGATGACCTTGAGCTCTGAGCCCTT 1629673ACGTTGGATGAAGGATCACGTGAAGTCAGG ACGTTGGATGGGCACCATGTGTGGCTAATT 1813856ACGTTGGATGTCTGACTCCCTGATTCAAGC ACGTTGGATGACAAAAATTAGCCGGGCGTG 1983421ACGTTGGATGTCCAGGTGTTATGGAGTCAG ACGTTGGATGGGCTTCTTGTGCTGCTGTGT 2001449ACGTTGGATGATGTCAAGTGCACCCACATG ACGTTGGATGAGGAAGAAACTGACGGAAGG 2017340ACGTTGGATGTATTCCACTGCCTGCTTTCC ACGTTGGATGGAAAACAGGAGGAAGTGGTG 2030578ACGTTGGATGTTCTCCACTTTCTGGTCAAC ACGTTGGATGAACAACCTTACTTCATGCCC 2049280ACGTTGGATGCTTCCCAACATTTTCGGCTC ACGTTGGATGTGGATACTGAGGGTCAACTG 2103062ACGTTGGATGTGCAGCCCTCAACCTTTCAG ACGTTGGATGCCTTATTCAGTTACTATTACG 2272115ACGTTGGATGAGTTGTGAGTGATTTCAGGG ACGTTGGATGCAGGCCTTCTTGCTCTTATC 2272116ACGTTGGATGATCTGTTGCCTTAGGTTCAC ACGTTGGATGCTGTGCCTTCTGAGTAGTTC 2314415ACGTTGGATGGGCTGAGTAACAGTCCATTG ACGTTGGATGCTTACAGTATCCAAAAAGGG 2314730ACGTTGGATGCTCAGGTAATCTGCCTTCTC ACGTTGGATGCAGGGATAATGAGAACAAATC 2653845ACGTTGGATGATCACTTGGACTCAGGAAGC ACGTTGGATGAGTCTTGCTCTGTTTCCAGG 3732603ACGTTGGATGCTCTCAATTCCATCAGTCTC ACGTTGGATGCTTTACGAATTTCACAACAGG 3811728ACGTTGGATGACGCGCCACACCTCCCTAC ACGTTGGATGACGTGTCGGTCCCCTTTCAT 3811729ACGTTGGATGTGGGCGAGGTTCTGCAGCGT ACGTTGGATGGTTTCGTTTCTCCGGCACAG 3811731ACGTTGGATGTGCGGTAAACGGTCCCAGAG ACGTTGGATGAACTCCGCCGGCCCCCTCCTA 3821522ACGTTGGATGAACCCGCACTACAAGATTCC ACGTTGGATGGTCAGTCCCACATTCAGAAC

TABLE 22 dbSNP Extend Term rs# Primer Mix 471365 TCCAAAACCACCAGATAAAATCACT 472795 GACATGTCCCTCTCGGCCT ACG 484315 GGTATCAGGAAGAGTCA ACT 488277AGTGCACACAGAACATTTAACA ACT 496251 GTATTGTCCTCCAGTGA ACG 502289CTGTAATCCCAGCTACTC ACT 507079 GGCAATGTTTGCCCTTT ACG 512071CCCTGACAATTCCAAAACTAA ACG 519088 TTTCGCCATGTTTGCCAGG ACG 529055GAGCAGGCAGCACAAGT ACT 534333 GGGAGAAAGTAACAGGGTC ACT 536111GTGAAGGTCTGAGCAAT ACG 536213 TGGTGTTAAGTGGCGTG ACG 571761CTAGGCTAGCAGTGGGGTTG ACT 575326 TGGTCATACCCTTCAAG ACT 575386GAAGGGTATGACCAAGC ACT 578886 TGAGCCAAGATCATGCC CGT 602646CCAGGGTATGAGCGGAGGA ACT 619424 TGCGGCCCCCGCCGGGTT ACT 620722GAATTGCCCGGCTCCGAAT ACT 631755 TCCAAGTTGGGTCAAAG ACT 639690CTGCTATTCATTTGTGTAGA ACT 645039 CCCTCAGTTTTTATTGATTATT ACT 664010ACCTCCAGGTAAAATGATTAGTT ACT 670232 TGGGCAAACAAGCCCAT CGT 678454CAGGGATGGTAATTGAC ACG 681516 GGCCACCTTCATATTTC ACG 683302CAGGAGATCCAGACCATCCC ACG 684174 CTCTGATGTTACCTCCTCC ACT 684846AGTTGTTCAGATCCTCC ACT 693208 TCAATCTAGTGATAAGGAGGGT ACT 831242CAGGTGGATGGGGACAC ACT 831245 CACACCACCACGCCCGGCT ACT 831246AGAATGGTGGTGTATTTTTAC ACT 831247 TAGTCCTACACAATCTGTTA ACT 831249GCTATCCCTCCCCCCTTCCC ACG 831250 GACAAAAAACCAAACACC ACT 831252CTATAAAGACACATGCACAC ACT 903950 AGATCACATTGCCAACCCCCA CGT 940054AAAGTAGCAGTTTGAGACCA ACT 940055 GTCTCTGACCACTTGACCCA ACT 1353566TTGTCAGTTATGAGACCTTG CGT 1390831 GGTTAGGAAGAAATCTGTG ACT 1403452CACAGATGCTCATGGGTCC ACT 1502761 GGAGGAGGCACTATTAAT ACT 1629673TGTGGAGACAAGGTCTCACT ACT 1813856 TCAAGCGATTCTCCTGC ACG 1983421GGCAGGGAAGAGAAGAGC ACT 2001449 CACATGCCTGCTCGCCCCC ACT 2017340CCCTAAAGCATCTCACAGCCCC ACT 2030578 TCATGCCCATTGGGTTAG ACT 2049280GGGTCAACTGTACCAAG ACG 2103062 GAGATCATTTCTCCTTCAAC ACT 2272115ATACCTCAGAATACAGCTTTTTTT ACG 2272116 TCTCATTTCTCCTCTCTTTC ACG 2314415TAGTTGATGAAGATTTGGG ACT 2314730 TCCTTCTTCTCTGCTTT ACT 2653845AAGCGGAGGTTGCAGTGAGC ACG 3732603 CTCATTTCCACCCTTCT ACT 3811728GTCCCCTTTCATCTAAAC ACT 3811729 TCTGCAGCGTGCGGCGA ACT 3811731CCTACCCCTACGGAGCC ACT 3821522 GCATCTTCAGGAATCTTG ACT

Genetic Analysis of Allelotyping Results

Allelotyping results are shown for cases and controls in Table 23. Theallele frequency for the A2 allele is noted in the fifth and sixthcolumns for breast cancer pools and control pools, respectively, where“AF” is allele frequency. The allele frequency for the A1 allele can beeasily calculated by subtracting the A2 allele frequency from 1 (A1AF=1−A2 AF). For example, the SNP in row 2 of Table 13 (rs3811729) hasthe following case and control allele frequencies: case A1 (A)=0.976;case A2 (G)=0.024; control A1 (A)=0.948; and control A2 (G)=0.052, wherethe nucleotide is provided in paranthesis. SNPs with blank allelefrequencies were untyped (“not AT”).

TABLE 23 dbSNP Position Chrom Alleles A2 Case A2 Control rs# in FIG. 3Position (A1/A2) AF AF p-Value 3811729 107 184282507 A/G 0.024 0.0520.017 693208 2157 184284557 C/G 0.186 0.207 0.368 3811731 not mapped A/G0.690 0.641 0.084 602646 not mapped C/G 0.693 0.660 0.244 488277 7300184289700 T/C 0.099 0.103 0.848 645039 8233 184290633 T/C 0.014 0.0080.316 1629673 not mapped T/C 0.064 0.093 0.069 670232 9647 184292047 A/T0.865 0.863 0.932 575326 9868 184292268 T/C 0.128 0.129 0.949 5753869889 184292289 C/G 0.776 0.779 0.905 684846 not mapped C/G 0.799 0.7450.033 471365 10621 184293021 G/C 0.746 0.740 0.815 496251 11003184293403 G/A 0.156 0.160 0.853 831246 11507 184293907 T/C 0.773 0.8020.243 831247 11527 184293927 G/C 0.829 0.826 0.879 831249 11718184294118 C/T 0.071 0.051 0.160 831250 11808 184294208 T/C 0.682 0.6970.589 831252 12024 184294424 T/C 0.752 0.762 0.695 512071 13963184296363 C/T 0.616 0.642 0.367 1502761 14300 184296700 A/C 0.596 0.5930.933 681516 14361 184296761 C/T 0.240 0.189 0.037 619424 16287184298687 T/G 0.076 0.070 0.704 620722 not mapped C/T 0.779 0.819 0.100529055 18635 184301035 A/G 0.601 0.637 0.219 664010 19365 184301765 T/G0.455 0.394 0.039 678454 not mapped T/G 0.000 0.004 0.117 2653845 24953184307353 G/A 0.175 0.168 0.775 472795 25435 184307835 G/A 0.082 0.0770.756 502289 not mapped T/G 0.003 0.000 0.172 507079 26847 184309247 G/A0.833 0.835 0.937 534333 27492 184309892 T/C 0.496 0.509 0.675 83124227620 184310020 T/C 0.728 0.776 0.064 536111 27678 184310078 C/T 0.8000.812 0.632 536213 27714 184310114 G/A 0.271 0.281 0.710 831245 29719184312119 A/G 0.020 0.012 0.314 639690 30234 184312634 T/C 0.117 0.1060.577 684174 31909 184314309 T/C 0.304 0.298 0.826 571761 32153184314553 C/G 0.406 0.425 0.525 1983421 33572 184315972 T/C 0.433 0.4250.791 2314415 42164 184324564 T/G 0.014 0.050 0.001 2103062 43925184326325 A/G 0.328 0.361 0.256 6804951 45031 184327431 C/T no AT no AT— 1403452 45655 184328055 T/C 0.025 0.072 0.001 903950 48350 184330750C/A 0.577 0.594 0.556 2017340 48418 184330818 A/G 0.033 0.054 0.0892001449 48563 184330963 G/C 0.262 0.205 0.025 3821522 53189 184335589A/G 0.500 0.480 0.508 1390831 56468 184338868 T/G 0.944 0.923 0.1601353566 59358 184341758 C/A 0.545 0.533 0.692 1813856 63761 184346161C/T 0.040 0.041 0.933 2272115 65931 184348331 G/A 0.324 0.370 0.1063732603 67040 184349440 G/C 0.228 0.209 0.429 940055 69491 184351891 A/C0.225 0.198 0.272 2314730 83308 184365708 A/G 0.649 0.691 0.135 484315not mapped C/G 0.256 0.234 0.404 KIAA0861_3732602 126545 184408945 C/Tno AT no AT — KIAA0861_2293203 137592 184419992 A/T no AT no AT —7639705 147169 184429569 G/T no AT no AT —

FIG. 16 shows the proximal SNPs in and around the KIAA0861 gene forfemales. As indicated, some of the SNPs were untyped. The position ofeach SNP on the chromosome is presented on the x-axis. The y-axis givesthe negative logarithm (base 10) of the p-value comparing the estimatedallele in the case group to that of the control group. The minor allelefrequency of the control group for each SNP designated by an X or othersymbol on the graphs in FIG. 16 can be determined by consulting Table23. By proceeding down the Table from top to bottom and across thegraphs from left to right the allele frequency associated with eachsymbol shown can be determined.

To aid the interpretation, multiple lines have been added to the graph.The broken horizontal lines are drawn at two common significance levels,0.05 and 0.01. The vertical broken lines are drawn every 20 kb to assistin the interpretation of distances between SNPs. Two other lines aredrawn to expose linear trends in the association of SNPs to the disease.The light gray line (or generally bottom-most curve) is a nonlinearsmoother through the data points on the graph using a local polynomialregression method (W. S. Cleveland, E. Grosse and W. M. Shyu (1992)Local regression models. Chapter 8 of Statistical Models in S eds J. M.Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.). The black line (orgenerally top-most curve, e.g., see peak in left-most graph just to theleft of position 92150000) provides a local test for excess statisticalsignificance to identify regions of association. This was created by useof a 10 kb sliding window with 1 kb step sizes. Within each window, achi-square goodness of fit test was applied to compare the proportion ofSNPs that were significant at a test wise level of 0.01, to theproportion that would be expected by chance alone (0.05 for the methodsused here). Resulting p-values that were less than 10⁻⁸ were truncatedat that value.

Finally, the gene or genes present in the loci region of the proximalSNPs as annotated by Locus Link (world wide web address:ncbi.nlm.nih.gov/LocusLink/) are provided on the graph. The exons andintrons of the genes in the covered region are plotted below each graphat the appropriate chromosomal positions. The gene boundary is indicatedby the broken horizontal line. The exon positions are shown as thick,unbroken bars. An arrow is place at the 3′ end of each gene to show thedirection of transcription.

Additional Genotyping

A total of five SNPs, including the incident SNP, were genotyped in thediscovery cohort. The discovery cohort is described in Example 1. Fourof the SNPs are non-synonomous, coding SNPs. Two of the SNPs (rs2001449and rs6804951) were found to be significantly associated with breastcancer with a p-value of 0.001 and 0.007, respectively. See Table 26.

The methods used to verify and genotype the five proximal SNPs of Table26 are the same methods described in Examples 1 and 2 herein. The PCRprimers and extend primers used in these assays are provided in Table 24and Table 25, respectively.

TABLE 24 dbSNP Forward Reverse rs# PCR primer PCR primer rs7639705ACGTTGGATGTGTCAGAA ACGTTGGATGTTACAGGCAT AGCAAACCTGGC TGGAGACAGCrs2293203 ACGTTGGATGCTGCATAA ACGTTGGATGT TGGTGGCTTTGGGTGGGTGTTCACTTTGCAG rs3732602 ACGTTGGATGCCCTCTTG ACGTTGGATGGTCAGGAAGTTCT AGACAGAGTTGAACTCCCG rs2001449 ACGTTGGATGAGGAAGAAACGTTGGATGA ACTGACGGAAGG TGTCAAGTGCACCCACATG rs6804951ACGTTGGATGAAGATACGA ACGTTGGATGG ATGGAGCCTGG CAATAGGACTCCCTTTACC

TABLE 25 dbSNP Extend Term rs# Primer Mix rs7639705 TGATGCACGTGGAGCAGCGT rs2293203 GCCCCTGGAAAAGGCCC CGT rs3732602 GGAAGATGATGAGACTAAAT ACGrs2001449 CACATGCCTGCTCGCCCCC ACT rs6804951 TCCCTTTACCTTCATGG ACG

Table 26, below, shows the case and control allele frequencies alongwith the p-values for all of the SNPs genotyped. The disease associatedallele of column 4 is in bold and the disease associated amino acid ofcolumn 5 is also in bold. The chromosome positions provided correspondto NCBI's Build 33. The amino acid change positions provided in column 5correspond to KIAA0861 polypeptide sequence of FIG. 12.

TABLE 26 Genotpying Results Location Amino A2 Position in within AllelesAcid A2 Case Control Odds Rs number FIG. 1 Gene (A1/A2) Change AF AFp-Value Ratio rs7639705 147169 Exon 7 G/T I1276L 0.805 0.811 0.794 1.04rs2293203 137592 Exon 8 A/T Q295L 0.990 0.980 0.685 1.25 rs3732602126545 Exon 11 C/T S506F monomorphic rs2001449 48563 Intron 19 G/C —0.307 0.218 0.001 1.59 rs6804951 45031 Exon 20 C/T A819T 0.044 0.0850.007 2.02

Example 7 NUMA1 Proximal SNPs

It has been discovered that a polymorphic variation (rs673478) in theNUMA1/FLJ20625/LOC220074 region is associated with the occurrence ofbreast cancer (see Examples 1 and 2). Subsequently, SNPs proximal to theincident SNP (rs673478) were identified and allelotyped in breast cancersample sets and control sample sets as described in Examples 1 and 2.Approximately sixty-three allelic variants located within theNUMA1/FLJ20625/LOC220074 region were identified and allelotyped. Thepolymorphic variants are set forth in Table 27. The chromosome positionprovided in column four of Table 27 is based on Genome “Build 33” ofNCBI's GenBank.

TABLE 27 dbSNP Position in Chromosome Allele rs# Chromosome FIG. 4Position Variants 1894003 11 174 71972974 T/C 2390981 11 815 71973615G/A 1939242 11 3480 71976280 C/T 1894004 11 9715 71982515 T/C 645603 1114755 71987555 G/A 661290 11 15912 71988712 A/G 679926 11 19834 71992634A/G 567026 11 19850 71992650 G/A 678193 11 20171 71992971 T/G 560777 1120500 71993300 C/T 676721 11 20536 71993336 C/T 585228 11 23187 71995987C/G 674319 11 25289 71998089 C/T 675185 11 25470 71998270 T/G 575871 1128720 72001520 A/G 547208 11 29566 72002366 C/T 2511075 11 3015572002955 T/C 642573 11 30752 72003552 C/G 671681 11 32710 72005510 C/T541022 11 32954 72005754 A/G 2511076 11 33725 72006525 G/A 3018308 1133842 72006642 T/C 671132 11 36345 72009145 G/A 552966 11 38115 72010915A/C 607446 11 39150 72011950 C/T 3018302 11 40840 72013640 T/G 301830111 41969 72014769 A/G 2511114 11 42045 72014845 C/T 548961 11 4378572016585 G/A 575831 11 44444 72017244 A/G 577435 11 44579 72017379 T/C495567 11 45386 72018186 C/T 493065 11 46827 72019627 A/G 597513 1147320 72020120 A/T 598835 11 47625 72020425 T/C 610004 11 47837 72020637T/C 610041 11 47866 72020666 A/G 673478 11 49002 72021802 T/C 670802 1149566 72022366 T/G 2511116 11 52058 72024858 C/T NUMA1_SNP1 11 5224972025049 A/C 517837 11 52257 72025057 C/T 615000 11 52850 72025650 T/G482013 11 53860 72026660 C/T NUMA1_SNP2 11 54052 72026852 T/C 2250866 1154411 72027211 T/C 2511078 11 55098 72027898 G/A 2508858 11 5530372028103 C/G 681069 11 59398 72032198 A/G 595062 11 59533 72032333 A/G542752 11 60542 72033342 A/T 2508856 11 61541 72034341 C/T 832658 1162309 72035109 G/A 3750908 11 72299 72045099 C/T 3793938 11 7303172045831 C/T 2276396 11 73803 72046603 G/C 1806778 11 80950 72053750 T/C4073394 11 82137 72054937 A/G 471547 11 96077 72068877 G/T 606136 1196470 72069270 A/G 532360 11 98116 72070916 G/T 703781 11 98184 72070984A/C 476753 11 132952 72105752 A/G

Assay for Verifying and Allelotyping SNPs

The methods used to verify and allelotype the proximal SNPs of Table 27are the same methods described in Examples 1 and 2 herein. The PCRprimers and extend primers used in these assays are provided in Table 28and Table 29, respectively.

TABLE 28 dbSNP Forward Reverse rs# PCR primer PCR primer 744293ACGTTGGATGTCTGCAGACAGTGGCCAATG ACGTTGGATGAGGGCCCAGGATCACAATAG 750789ACGTTGGATGTTCATCTGGTAAGTCCCACC ACGTTGGATGTGAAACAAGAGAGGCCCTTC 1939110ACGTTGGATGTCTTTAGGTCCAGGATTCCC ACGTTGGATGTATAGTCAGCATCGTCCCTG 2005192ACGTTGGATGCCCTCAGAGTTTGGACATAT ACGTTGGATGTATCCAAAATGCAGACACAGSNP00004859 ACGTTGGATGGTGTTTATCCCAACCCTTCCACGTTGGATGGGAGGAAATACAGCCTGTTC 744292 ACGTTGGATGATCCTAGAGGACTGGGAAAGACGTTGGATGCTGCTTCTGTTCCCACAATG 754490 ACGTTGGATGAAGGGTGGAGAACTCATGGGACGTTGGATGACCCCTATTTTGAAGCAGGC 872619 ACGTTGGATGTTCACACCAAGGTGTTACTGACGTTGGATGCACAATAATGTGTTCAGGGC 1807014 ACGTTGGATGCTGGGCAACAAGAGTGAAACACGTTGGATGGCCCAAAACCACTGAGATTC 1815753 ACGTTGGATGTAGAGTGAAGACAGAGCTCCACGTTGGATGATAAACCCAGGCATTCGAGC 1892893 ACGTTGGATGTCCTATGAAGATTCATCTGCACGTTGGATGGTCCAGAGTTTTAGACTCAAG 1939111 ACGTTGGATGTCCTTAACCTTATTGGTGGCACGTTGGATGGTTGGGTTCAGTAGAAGAGA 1939112 ACGTTGGATGAGCCACCAATAAGGTTAAGGACGTTGGATGTGTCTCTCACTTCCTCAACC 1939113 ACGTTGGATGAGACACACAAGGCAAGGTTCACGTTGGATGCCAGAGAGGAGTCTGTCTAG 1939114 ACGTTGGATGGAAAACATTGGTCCAGGCAGACGTTGGATGCAAGAACCCAGGCATCAATG 1939115 ACGTTGGATGGACCACGGAATCCTTTTTTCAACGTTGGATGGCTCAAATTCTGTTCTTTAG 1939116 ACGTTGGATGACATAGGTAGTCAGGCACTCACGTTGGATGGCAGCTCTTTTTTTCCTACC 1939117 ACGTTGGATGGGGAACTTTTCACATTACACACGTTGGATGGAGAGTTTGCATTTGGTGATC 1939118 ACGTTGGATGATGTTGCTGTATGGTCCTCCACGTTGGATGGAAAACATTGCGCTAGGCAC 1954769 ACGTTGGATGTGAGTGACCAAGTTGCTCTGACGTTGGATGTCTACCTTCATGATGTCCCC 2000537 ACGTTGGATGGGTCTTTTATGAGGTTTCTCCACGTTGGATGGTTAAACTTACAAATCTAGC 2011913 ACGTTGGATGGCTGAGTGTGGATTGCTCTGACGTTGGATGAGTAAACCAACACCCAGAAC 2015747 ACGTTGGATGTGAAGCAGGCTTTCCCAATGACGTTGGATGGGTAGTGAAGGGTGGAGAAC 2105587 ACGTTGGATGAAGAAATACCAGGCCGGGAGACGTTGGATGCTCAAGTATCCTCCCTTCTC 2155081 ACGTTGGATGAGGCAATGCTTCCATTGTTCACGTTGGATGTCATAGCATTTTACCCCTGG 2186617 ACGTTGGATGGCTACATATGGATCTTGGTCACGTTGGATGGACCAGCACTAACTCTAAAC 2508423 ACGTTGGATGCTCCTCTGTAAAACCAGGACACGTTGGATGAGAAACTCTCCTAAGCACAC 2511880 ACGTTGGATGGTTCCCTGATGGAAAATGCCACGTTGGATGCCAGAATGCCTTATCCACAG 2511881 ACGTTGGATGTGACTCTGCTGTGAGATTGGACGTTGGATGACATCGGTTTCACCTCCAAC 2512990 ACGTTGGATGAGCCAGCAGAGAAAACAGTCACGTTGGATGGCCACTTACTACCTGTTGTC 2555537 ACGTTGGATGGGACATAACCATAGGCCATCACGTTGGATGCATTGACAGCTGTATTGCAC 3016250 ACGTTGGATGTTTTTGAGACGGAGTCTCGCACGTTGGATGAGGCAGGAGAATGGCGTGAA 3016251 ACGTTGGATGAGCTTGCAGTGAGCCGAGATACGTTGGATGTTTTTGAGACGGAGTCTCGC 3016252 ACGTTGGATGTGGTGAAGAGAAGTCAAAGCACGTTGGATGAGGCTGAATGATTCCCCTTC 3781614 ACGTTGGATGTGGTCAGTCAGTTAGCCAGGACGTTGGATGCCCTAATGATGGTAGACTGC 3809048 ACGTTGGATGACCACCAAGATAACGACCGCACGTTGGATGAGCCACCTCCTTGTCCAGTG 4128368 ACGTTGGATGGGACAATATTTAGTTATGCACACGTTGGATGTTCAAGGTCATCCCGTTATC

TABLE 29 dbSNP Extend Term rs# Primer Mix 744293 GATGGCCCAGTTCCCTGCC ACG750789 AGAGGCCCTTCCAGGGCT ACT 1939110 CGTCCCTGACCTGGACTTA ACG 2005192AATGCAGACACAGTTCTGGG CGT SNP00004859 CTGAAAAATAGCTAGTTC ACG 744292ACTCACCTCTACCCATAAGG ACT 754490 TTGAAGCAGGCTTTCCCA ACT 872619TGTGTTCAGGGCTTTCTCAT ACT 1807014 GTGTTTTTTTTTTCCCCC ACG 1815753CAGGCATTCGAGCCAGCAAT ACT 1892893 ATGTTTTATTCTTTCACAAAAGT ACT 1939111GGAGGAGGCAGTAAGGAA ACT 1939112 CTTCCAACTTTTTTCTCTTG ACT 1939113GTCTAGTCCTCCAAGCC ACG 1939114 ATCAATGGGGTGGTGCA ACT 1939115TCTGTTCTTTAGAAGGCT CGT 1939116 TGTACCAATATGACAATTTAACC ACT 1939117CCTGACACATAGTTCATGCTC ACT 1939118 GCTAGGCACAAAATTAAAGAGAT ACT 1954769TCCCCGCCTTTCCCTCC CGT 2000537 ACAAATCTAGCACCGAAGG ACT 2011913ATATAAGCAATTCACAAGTAATGT ACT 2015747 AAGGGTGGAGAACTCATGG ACT 2105587TATCCTCCCTTCTCAGCAAG ACT 2155081 CATTTTACCCCTGGATTATA ACT 2186617CTCAACCTCAACTCAACT CGT 2508423 TCTCCTAAGCACACTATGTATAT ACG 2511880AGGATATTAGTCATGCTGGG ACT 2511881 CACCTCCAACACGGTCCCC CGT 2512990GTTGTCTTCCCAACTCC ACT 2555537 ACTGTGGACATTGGTGT ACT 3016250GGCGTGAACCCGGGAGG ACG 3016251 CTGTCGCCCAGGCCGGA ACT 3016252GATTCCCCTTCTTCTAAA ACT 3781614 TAGACTGCAGAGTAGCA ACT 3809048TGGGCCTACTTCCCTGA ACT 4128368 TTTTCATCACATAGCTCATCT CGT

Genetic Analysis of Allelotyping Results

Allelotyping results are shown for cases and controls in Table 30. Theallele frequency for the A2 allele is noted in the fifth and sixthcolumns for breast cancer pools and control pools, respectively, where“AF” is allele frequency. The allele frequency for the A1 allele can beeasily calculated by subtracting the A2 allele frequency from 1 (A1AF=1−A2 AF). For example, the SNP rs1894003 has the following case andcontrol allele frequencies: case A1 (T)=0.192; case A2 (C)=0.808;control A1 (T)=0.115; and control A2 (C)=0.885, where the nucleotide isprovided in parenthesis. SNPs with blank allele frequencies wereuntyped.

TABLE 30 dbSNP Position in Chromosome A1/A2 A2 Case A2 Control rs# FIG.4 Position Allele AF AF p-Value 1894003 174 71972974 T/C 0.808 0.8850.00061 2390981 815 71973615 G/A 0.013 0.002 0.02306 1939242 348071976280 C/T 0.902 0.943 0.01186 1894004 9715 71982515 T/C 0.020 0.0090.12637 645603 14755 71987555 G/A 0.029 0.021 0.37479 661290 1591271988712 A/G 0.813 0.833 0.39013 679926 19834 71992634 A/G 0.077 0.0390.00741 567026 19850 71992650 G/A 0.059 0.038 0.09767 678193 2017171992971 T/G 0.868 0.920 0.00597 560777 20500 71993300 C/T 0.070 0.0410.03071 676721 20536 71993336 C/T 0.901 0.947 0.00419 585228 2318771995987 C/G 0.842 0.914 0.00043 674319 25289 71998089 C/T 0.027 0.0270.96556 675185 25470 71998270 T/G 0.763 0.853 0.00031 575871 2872072001520 A/G 0.924 0.932 0.61199 547208 29566 72002366 C/T 0.042 0.0230.07555 2511075 30155 72002955 T/C 0.894 0.944 0.00256 642573 3075272003552 C/G 0.047 0.022 0.02382 671681 32710 72005510 C/T 0.072 0.0430.03643 541022 32954 72005754 A/G 0.070 0.040 0.02829 2511076 3372572006525 G/A 0.223 0.256 0.20380 3018308 33842 72006642 T/C 0.442 0.4390.92279 671132 36345 72009145 G/A 0.970 0.971 0.96469 552966 3811572010915 A/C 0.845 0.903 0.00393 607446 39150 72011950 C/T 0.861 0.9180.00279 3018302 40840 72013640 T/G 0.767 0.827 0.01378 3018301 4196972014769 A/G 0.734 0.837 0.00011 2511114 42045 72014845 C/T 0.080 0.0360.00222 548961 43785 72016585 G/A 0.852 0.905 0.00833 575831 4444472017244 A/G 0.946 0.961 0.22995 577435 44579 72017379 T/C 0.013 0.0070.34863 495567 45386 72018186 C/T 0.891 0.951 0.00045 493065 4682772019627 A/G 0.823 0.904 0.00022 597513 47320 72020120 A/T 0.890 0.9360.00667 598835 47625 72020425 T/C 0.074 0.038 0.00994 610004 4783772020637 T/C 0.088 0.041 0.00209 610041 47866 72020666 A/G 0.872 0.9330.00102 673478 49002 72021802 T/C 0.173 0.094 0.00026 670802 4956672022366 T/G 0.876 0.920 0.01646 2511116 52058 72024858 C/T 0.898 0.9450.00437 NUMA1_SNP1 52249 72025049 A/C 0.901 0.924 0.17421 517837 5225772025057 C/T 0.095 0.061 0.03504 615000 52850 72025650 T/G 0.812 0.9160.00001 482013 53860 72026660 C/T 0.884 0.924 0.02391 NUMA1_SNP2 5405272026852 T/C 0.066 0.034 0.01392 2250866 54411 72027211 T/C 0.855 0.9180.00132 2511078 55098 72027898 G/A 0.299 0.295 0.86946 2508858 5530372028103 C/G 0.898 0.944 0.00509 681069 59398 72032198 A/G 0.835 0.8780.04069 595062 59533 72032333 A/G 0.925 0.942 0.25198 542752 6054272033342 A/T 0.853 0.915 0.00192 2508856 61541 72034341 C/T 0.074 0.0600.33745 832658 62309 72035109 G/A 0.047 0.023 0.02994 3750908 7229972045099 C/T 0.912 0.944 0.04342 3793938 73031 72045831 C/T 0.084 0.0450.00763 2276396 73803 72046603 G/C 0.892 0.937 0.00799 1806778 8095072053750 T/C 0.041 0.034 0.50886 4073394 82137 72054937 A/G 0.547 0.5790.28705 471547 96077 72068877 G/T 0.490 0.522 0.28304 606136 9647072069270 A/G 0.444 0.468 0.43474 532360 98116 72070916 G/T 0.043 0.0210.03475 703781 98184 72070984 A/C 0.078 0.080 0.89053 476753 13295272105752 A/G 0.922 0.936 0.39563

FIG. 17 shows the proximal SNPs in and around the NUMA1 region forfemales. The position of each SNP on the chromosome is presented on thex-axis. The y-axis gives the negative logarithm (base 10) of the p-valuecomparing the estimated allele in the case group to that of the controlgroup. The minor allele frequency of the control group for each SNPdesignated by an X or other symbol on the graphs in FIG. 17 can bedetermined by consulting Table 30. By proceeding down the Table from topto bottom and across the graphs from left to right the allele frequencyassociated with each symbol shown can be determined.

To aid the interpretation, multiple lines have been added to the graph.The broken horizontal lines are drawn at two common significance levels,0.05 and 0.01. The vertical broken lines are drawn every 20 kb to assistin the interpretation of distances between SNPs. Two other lines aredrawn to expose linear trends in the association of SNPs to the disease.The light gray line (or generally bottom-most curve) is a nonlinearsmoother through the data points on the graph using a local polynomialregression method (W. S. Cleveland, E. Grosse and W. M. Shyu (1992)Local regression models. Chapter 8 of Statistical Models in S eds J. M.Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.). The black line (orgenerally top-most curve, e.g., see peak in left-most graph just to theleft of position 92150000) provides a local test for excess statisticalsignificance to identify regions of association. This was created by useof a 10 kb sliding window with 1 kb step sizes. Within each window, achi-square goodness of fit test was applied to compare the proportion ofSNPs that were significant at a test wise level of 0.01, to theproportion that would be expected by chance alone (0.05 for the methodsused here). Resulting p-values that were less than 10⁻⁸ were truncatedat that value.

Finally, the gene or genes present in the loci region of the proximalSNPs as annotated by Locus Link (world wide web address:ncbi.nlm.nih.gov/LocusLink/) are provided on the graph. The exons andintrons of the genes in the covered region are plotted below each graphat the appropriate chromosomal positions. The gene boundary is indicatedby the broken horizontal line. The exon positions are shown as thick,unbroken bars. An arrow is place at the 3′ end of each gene to show thedirection of transcription.

Example 8 Meta Analysis of Incident SNPs

Meta-analysis was performed of five of the incident SNPs disclosed inTable 3 (ICAM region (ICAM_SNP), MAPK10 (rs1541998), KIAA0861(rs2001449), NUMA1 region (rs673478) and GALE region (rs4237)) based ongenotype results provided in Table 6B. FIGS. 18-21 depict odds ratiosfor the discovery samples and replication samples (see Example 3)individually, and the combined meta analysis odds ratio for the namedSNP. The boxes are centered over the odds ratio for each sample, withthe size of the box correlated to the contribution of each sample to thecombined meta analysis odds ratio. The lines extending from each box arethe 95% confidence interval values. The diamond is centered over thecombined meta analysis odds ratio with the ends of the diamond depictingthe 95% confidence interval values. The meta-analysis furtherillustrates the strong association each of the incident SNPs has withbreast cancer across multiple case and control samples.

The subjects available for discovery from Germany included 272 cases and276 controls. The subjects available for replication from Australiaincluded 190 breast cancer cases and 190 controls. Meta analyses,combining the results of the German discovery sample and the Australianreplication sample, were carried out using a random effects(DerSimonian-Laird) procedure.

Example 9 Description of Development of Predictive Breast Cancer Models

The five SNPs reported in Example 3 were identified as beingsignificantly associated with breast cancer according to the replicationanalysis discussed therein. These five SNPs are a subset of the panel ofSNPs associated with breast cancer in the German chort referenced inExample 1 and reported in provisional patent application No. 60/429,136filed Nov. 25, 2002 and provisional patent application No. 60/490,234filed Jul. 24, 2003, having attorney docket number 524593004100 and524593004101, respectively.

The clinical importance of these SNPs was estimated by combining theminto a single logistic regression model. The coefficients of the modelwere used to estimate penetrance, relative risk and odds ratio valuesfor estimating a subject's risk of having or developing breast canceraccording to the subject's genotype. Penetrance is a probability that anindividual has or will have breast cancer given their genotype (e.g., avalue of 0.01 in the tables is equal to a 1% chance of having ordeveloping breast cancer). The relative risk of breast cancer is basedupon penetrance values, and is expressed in two forms. One form, notedas RR in the tables below, is expressed as a risk with respect to thelowest risk group (e.g., the most protected group being the 00000genotype listed in Table 33). The other form is expressed as a risk withrespect to a population average risk of breast cancer, which is noted asRR(Pop) in Table 35 below. Both of these expressions of relative riskare useful to a clinician for assessing risk of breast cancer in anindividual and targeting appropriate detection, prevention and/ortreatment regimens to the subject. Both expressions of relative riskalso are useful to an insurance company to assess population risks ofbreast cancer (e.g., for developing actuarial tables), where individualgenotypes often are provided to the company on an anonymous basis. Oddsratios are the odds one group has or will develop breast cancer withrespect to another group, the other group often being the mostprotective group or the group having a population average risk of breastcancer. Relative risk often is a more reliable assessment of risk incomparison to an odds ratio when the disease or condition at issue ismore prevalent.

To fit the single logistic model, all cases and controls from the Germanand Australian samples were used (see Examples 1 and 3, respectively).Controls were coded as 0 and cases were coded as 1. Based on thegenotype penetrance estimates of each SNP (Table 31), GP01.025495354(rs4237), GP03.197942797 (rs2001449), GP11.079035103 (rs673478) weremodeled as additive by coding the genotypes 0, 1, or 2 for the low riskhomozygote, the heterozygote, or high risk homozygote, respectively. TheSNP FCH.0994 (ICAM_SNP1) was modeled as recessive coding the genotypes0, 0, or 2 for the low risk homozygote, heterozygote, or high riskhomozygote, respectively. The SNP GP04.091348915 (rs1541998) was modeledas dominant coding the genotypes 0, 2, or 2 for the low risk homozygote,the heterozygote, or high risk homozygote, respectively. Table 31summarizes this analysis.

TABLE 31 SNP: Case Control Genotype N (N = 254) (N = 268) P (D|G) (%)P-value ICAM_SNP1: 497 45% (103) 32% (85) 4.140 0.006210 CC 42% (98) 47%(126) 2.700 CT 13% (30) 21% (13) 1.910 TT rs4237: AA 494 34% (79) 29%(75) 3.550 0.186000 AG 49% (113) 48% (126) 3.040 GG 17% (40) 23% (61)2.240 rs2001449: 508 46% (112) 60% (158) 2.280 0.002930 GG 48% (117) 36%(94) 3.940 GC  7% (17)  4% (10) 5.300 CC rs673478: TT 509 84% (206) 91%(240) 2.800 0.040700 TC 14% (35)  9% (25) 4.490 CC  1% (3)  0% (0)100.00 rs1541998: 493  5% (12)  4% (10) 3.710 0.012100 CC 36% (87) 24%(61) 4.370 CT 59% (143) 72% (180) 2.490 TT

Based on this coding, there are a total of 108 unique genotype codesfrom the 243 unique five SNP genotypes. The relationship between thefive SNP genotypes and the case-control status was fit using logisticregression. Many models were fit and compared including the five SNPsand all possible interaction among SNPs and study center. Onlystatistically significant terms from this complete model were includedin the final model, shown in Table 32.

TABLE 32 Estimate Std. Error z value Pr(>|z|) (Intercept) −1.344460.25972 −5.177 2.26e−07 FCH.0994 0.77607 0.19835 3.913 9.13e−05   42370.54525 0.17666 3.086 0.002025 2001449 0.60383 0.28487 2.120 0.0340331541998 0.22051 0.07849 2.809 0.004963  673478 0.59961 0.21737 2.7580.005807 FCH.0994c: 4237 −0.52636 0.14516 −3.626 0.000288 FCH.0994c:2001449 −0.35613 0.24503 −1.453 0.146113 4237c: 2001449 −0.15685 0.20191−0.777 0.437257 FCH.0994c: 4237c2001449 0.41305 0.18391 2.246 0.024705Null deviance: 1136.7 on 820 degrees of freedom Residual deviance:1069.6 on 811 degrees of freedom AIC: 1089.6

The penetrance was calculated for each of the 108 unique genotype codesusing this model and an assumed disease prevalence of 0.03 (prev), thecumulative incidence for the age range of the sample in question. Thiswas calculated from the logistic model as follows:

penetrance=exp(ŷ+adj)/(1+exp(ŷ+adj))

where

ŷ=1/(1+exp(−1.344+0.776*A+0.545*B+0.604*C+0.221*D+0.600*E−0.356*A*C−0.157*B*C+0.413*A*B*C))

and

adj=ln(prev/(1−prev)*freq(case)/(1−freq(case)).

Here A, B, C, D, and E refer to the genotype codes for the SNPsFCH.0994, 4237, 2001449, 1541998, and 673478, respectively.

Table 33 summarizes statistics of interest for each genotype code.“Geno” shows each genotype code with the five integer codes formatted asan integer string. “N Case” and “N Control” is the number of cases andcontrols with the specified code, respectively. “Frequency” is theexpected percent of individuals in the population having that codecalculated as the average of the case and control frequencies weightedby the probability of disease in this sample (0.03). “OR” is the oddsratio comparing the odds of the specified code to the odds of the mostprotective code (00000) using the parameter estimates from the logisticregression model. “OR (Frq)” is an odds ratio estimated using thefrequency of cases and control with the specified genotype code and themost protective code. “RR” is the relative risk comparing theprobability of disease of the specified code to the probability ofdisease of the most protective code. “Penetrance” is the probability ofdisease given the genotype code, followed by “Lower” and “Upper” whichgive the 95% confidence interval for the penetrance. As can be seen bythe ratios for OR and RR, the 00000 genotype was the most protectiveagainst breast cancer occurrence.

TABLE 33 Confidence Interval Geno N Case N Control Frequency OR OR (Frq)RR Penetrance Lower Upper 00000 6 26 5.94% 1.00 1.00 1.00 0.010 0.0070.014 00001 0 3 0.68% 1.75 0.00 1.74 0.017 0.011 0.029 00002 0 0 0.00%3.08 3.01 0.030 0.013 0.069 00020 3 9 2.06% 1.61 1.44 1.60 0.016 0.0110.023 00021 0 3 0.68% 2.83 0.00 2.78 0.028 0.017 0.047 00022 0 0 0.00%4.97 4.78 0.048 0.021 0.108 00100 9 20 4.60% 1.67 1.95 1.66 0.017 0.0120.023 00101 2 1 0.24% 2.93 8.67 2.87 0.029 0.018 0.047 00102 0 0 0.00%5.13 4.93 0.050 0.022 0.110 00120 7 6 1.41% 2.69 5.06 2.65 0.027 0.0180.038 00121 0 0 0.00% 4.73 4.56 0.046 0.028 0.075 00122 0 0 0.00% 8.297.72 0.078 0.034 0.168 00200 1 4 0.91% 2.78 1.08 2.74 0.027 0.018 0.04200201 0 0 0.00% 4.88 4.70 0.047 0.027 0.082 00202 0 0 0.00% 8.57 7.960.080 0.034 0.178 00220 1 1 0.23% 4.50 4.33 4.34 0.044 0.027 0.070 002211 0 0.01% 7.89 7.38 0.074 0.041 0.129 00222 0 0 0.00% 13.83 12.25 0.1230.052 0.263 01000 24 47 10.84% 1.26 2.21 1.26 0.013 0.010 0.016 01001 31 0.25% 2.21 13.00 2.18 0.022 0.014 0.034 01002 0 0 0.00% 3.87 3.770.038 0.017 0.083 01020 18 22 5.12% 2.03 3.55 2.01 0.020 0.015 0.02701021 4 4 0.94% 3.57 4.33 3.48 0.035 0.022 0.055 01022 0 0 0.00% 6.265.94 0.060 0.027 0.129 01100 21 33 7.64% 2.10 2.76 2.08 0.021 0.0170.026 01101 2 4 0.92% 3.69 2.17 3.59 0.036 0.024 0.055 01102 0 0 0.00%6.47 6.13 0.062 0.028 0.130 01120 15 6 1.47% 3.39 10.83 3.31 0.033 0.0250.045 01121 0 0 0.00% 5.95 5.67 0.057 0.036 0.089 01122 0 0 0.00% 10.449.54 0.096 0.044 0.198 01200 5 4 0.94% 3.51 5.42 3.42 0.034 0.023 0.05001201 0 1 0.23% 6.15 0.00 5.85 0.059 0.035 0.097 01202 0 0 0.00% 10.799.82 0.099 0.044 0.209 01220 1 0 0.01% 5.66 5.41 0.054 0.035 0.083 012210 0 0.00% 9.93 9.12 0.092 0.054 0.152 01222 0 0 0.00% 17.42 14.95 0.1500.067 0.304 02000 22 39 9.01% 1.59 2.44 1.58 0.016 0.012 0.021 02001 2 10.24% 2.78 8.67 2.73 0.027 0.017 0.043 02002 1 0 0.01% 4.88 4.70 0.0470.021 0.103 02020 16 10 2.39% 2.56 6.93 2.52 0.025 0.018 0.035 02021 2 20.47% 4.49 4.33 4.34 0.044 0.027 0.070 02022 2 0 0.02% 7.88 7.37 0.0740.033 0.158 02100 21 18 4.24% 2.65 5.06 2.60 0.026 0.020 0.035 02101 5 30.72% 4.64 7.22 4.48 0.045 0.029 0.070 02102 0 0 0.00% 8.14 7.60 0.0760.035 0.160 02120 11 8 1.90% 4.28 5.96 4.14 0.042 0.030 0.058 02121 1 00.01% 7.50 7.04 0.071 0.044 0.112 02122 0 0 0.00% 13.15 11.72 0.1180.054 0.239 02200 4 4 0.94% 4.42 4.33 4.27 0.043 0.028 0.065 02201 3 10.25% 7.75 13.00 7.26 0.073 0.043 0.121 02202 0 0 0.00% 13.59 12.060.121 0.053 0.252 02220 2 1 0.24% 7.13 8.67 6.72 0.068 0.043 0.106 022210 0 0.00% 12.51 11.21 0.113 0.065 0.189 02222 0 0 0.00% 21.94 18.130.182 0.082 0.358 20000 9 6 1.43% 1.58 6.50 1.57 0.016 0.011 0.023 200010 0 0.00% 2.76 2.72 0.027 0.016 0.045 20002 0 0 0.00% 4.85 4.67 0.0470.020 0.105 20020 8 4 0.97% 2.54 8.67 2.51 0.025 0.017 0.037 20021 0 00.00% 4.46 4.31 0.043 0.026 0.072 20022 0 0 0.00% 7.83 7.33 0.074 0.0320.161 20100 5 6 1.40% 2.63 3.61 2.59 0.026 0.018 0.037 20101 4 1 0.26%4.61 17.33 4.45 0.045 0.027 0.072 20102 0 0 0.00% 8.09 7.55 0.076 0.0330.163 20120 4 1 0.26% 4.25 17.33 4.11 0.041 0.028 0.060 20121 1 0 0.01%7.45 6.99 0.070 0.042 0.115 20122 0 0 0.00% 13.06 11.65 0.117 0.0520.242 20200 0 1 0.23% 4.39 0.00 4.24 0.043 0.027 0.066 20201 1 0 0.01%7.70 7.21 0.072 0.041 0.124 20202 0 0 0.00% 13.50 11.99 0.121 0.0520.255 20220 0 0 0.00% 7.09 6.68 0.067 0.041 0.108 20221 0 0 0.00% 12.4311.15 0.112 0.063 0.192 20222 0 0 0.00% 21.80 18.03 0.181 0.080 0.36121000 22 25 5.83% 1.99 3.81 1.97 0.020 0.015 0.026 21001 3 4 0.93% 3.483.25 3.40 0.034 0.022 0.053 21002 1 0 0.01% 6.11 5.81 0.058 0.026 0.12521020 11 14 3.26% 3.21 3.40 3.14 0.032 0.023 0.043 21021 1 2 0.46% 5.622.17 5.37 0.054 0.034 0.085 21022 0 0 0.00% 9.86 9.05 0.091 0.041 0.19021100 26 24 5.64% 3.31 4.69 3.24 0.033 0.025 0.042 21101 1 2 0.46% 5.812.17 5.54 0.056 0.036 0.085 21102 1 0 0.01% 10.19 9.33 0.094 0.043 0.19121120 16 6 1.48% 5.35 11.56 5.12 0.051 0.037 0.071 21121 4 0 0.03% 9.388.65 0.087 0.055 0.135 21122 0 0 0.00% 16.45 14.24 0.143 0.067 0.28121200 3 1 0.25% 5.53 13.00 5.29 0.053 0.036 0.078 21201 3 0 0.02% 9.698.92 0.090 0.054 0.146 21202 0 0 0.00% 17.00 14.65 0.147 0.067 0.29521220 2 2 0.47% 8.93 4.33 8.27 0.083 0.053 0.127 21221 1 0 0.01% 15.6513.65 0.137 0.081 0.223 21222 0 0 0.00% 27.46 21.69 0.218 0.101 0.40922000 13 23 5.31% 2.50 2.45 2.46 0.025 0.018 0.034 22001 4 1 0.26% 4.3917.33 4.24 0.043 0.027 0.068 22002 0 1 0.23% 7.69 0.00 7.21 0.072 0.0320.154 22020 3 10 2.29% 4.04 1.30 3.92 0.039 0.027 0.056 22021 1 0 0.01%7.08 6.67 0.067 0.041 0.107 22022 0 0 0.00% 12.42 11.14 0.112 0.0510.230 22100 15 5 1.25% 4.17 13.00 4.04 0.041 0.030 0.055 22101 1 0 0.01%7.32 6.88 0.069 0.044 0.107 22102 0 0 0.00% 12.83 11.47 0.115 0.0530.232 22120 3 5 1.16% 6.74 2.60 6.37 0.064 0.045 0.091 22121 3 1 0.25%11.82 13.00 10.66 0.107 0.066 0.168 22122 0 0 0.00% 20.72 17.30 0.1740.081 0.333 22200 4 0 0.03% 6.96 6.57 0.066 0.043 0.100 22201 0 0 0.00%12.21 10.97 0.110 0.065 0.181 22202 0 0 0.00% 21.42 17.77 0.179 0.0810.348 22220 4 1 0.26% 11.24 17.33 10.19 0.102 0.064 0.160 22221 0 00.00% 19.72 16.60 0.167 0.097 0.271 22222 0 0 0.00% 34.58 25.86 0.2600.122 0.470

To simplify the interpretation of genotype risk, the 243 uniquegenotypes were divided into five risk classes on the basis of eachestimated penetrance. The levels selected for risk class definitions andthe resulting assignment of genotypes into five risk classes is shown inTable 34. The frequency percent of each genotype combination is given inparentheses.

TABLE 34 Class 2 Class 3 Class 1 (0.013, (0.025, Class 4 Class 5 (0,0.013] 0.025] 0.042] (0.042, 0.1] (0.1, 1) 00000 (5.94) 00001 (0.68)00022 (0.00) 00102 (0.00) 00222 (0.00) 00020 (2.06) 00002 (0.00) 00121(0.00) 00122 (0.00) 01222 (0.00)  01000 (10.84) 00021 (0.68) 00220(0.23) 00201 (0.00) 02022 (0.02) 22000 (5.31) 00100 (4.60) 01002 (0.00)00202 (0.00) 02122 (0.00) 00101 (0.24) 01021 (0.94) 00221 (0.01) 02202(0.00) 00120 (1.41) 01101 (0.92) 01022 (0.00) 02221 (0.00) 00200 (0.91)01120 (1.47) 01102 (0.00) 02222 (0.00) 01001 (0.25) 01200 (0.94) 01121(0.00) 20002 (0.00) 01020 (5.12) 02001 (0.24) 01122 (0.00) 20022 (0.00)01100 (7.64) 02020 (2.39) 01201 (0.23) 20122 (0.00) 02000 (9.01) 02100(4.24) 01202 (0.00) 20222 (0.00) 21000 (5.83) 02200 (0.94) 01220 (0.01)21102 (0.01) 22001 (0.26) 20000 (1.43) 01221 (0.00) 21122 (0.00) 22020(2.29) 20100 (1.40) 02002 (0.01) 21201 (0.02) 20200 (0.23) 02021 (0.47)21202 (0.00) 20220 (0.00) 02101 (0.72) 21221 (0.01) 21001 (0.93) 02102(0.00) 21222 (0.00) 21020 (3.26) 02120 (1.90) 22102 (0.00) 21100 (5.64)02121 (0.01) 22121 (0.25) 22002 (0.23) 02201 (0.25) 22122 (0.00) 22021(0.01) 02220 (0.24) 22200 (0.03) 22100 (1.25) 20001 (0.00) 22201 (0.00)20020 (0.97) 22202 (0.00) 20021 (0.00) 22220 (0.26) 20101 (0.26) 22221(0.00) 20102 (0.00) 22222 (0.00) 20120 (0.26) 20121 (0.01) 20201 (0.01)20202 (0.00) 20221 (0.00) 21002 (0.01) 21021 (0.46) 21022 (0.00) 21101(0.46) 21120 (1.48) 21121 (0.03) 21200 (0.25) 21220 (0.47) 22022 (0.00)22101 (0.01) 22120 (1.16)

With this classification, each genotype was recoded as belonging totheir respective class and a logistic regression model was fit with thegenotype risk class as a categorical variable. Key summary statisticsare summarized in Table 35. Each group is described by the number ofcases, number of controls, the estimated risk class populationfrequency, the odds ratio comparing the odds of the given risk classcompared to the odds of the lowest risk class, the penetrance, therelative risk (risk class penetrance divided by most protective riskclass penetrance), and the population relative risk (risk classpenetrance divided by the disease prevalence: 0.03).

TABLE 35 Risk N Frequency RR Class N Case Control (%) OR Penetrance RR(Pop) G1 46 105 24.2 1.0 0.012 1.0 0.41 G2 112 168 38.9 1.5 0.019 1.50.62 G3 140 113 26.7 2.8 0.034 2.8 1.13 G4 77 40 9.7 4.4 0.052 4.2 1.73G5 18 2 0.06 20.5 0.204 16.6 6.79

Example 10 Inhibition of ICAM Gene Expression by Transfection ofSpecific siRNAs

RNAi-based gene inhibition was selected as a rapid way to inhibitexpression of ICAM1 in cultured cells. siRNA reagents were selectivelydesigned to target the ICAM1 gene. Algorithms useful for designing siRNAmolecules specific for ICAM1 gene are disclosed at the world wide webaddress dharmacon.com. siRNA molecules up to 21 nucleotides in lengthwere utilized.

Table 31 summarizes the features of the duplexes that were used in theassays to target ICAM1. A non-homologous siRNA reagent (siGL2 control)was used as a negative control, and a non-homologous siRNA reagent(siRNA_RAD21_(—)1175 control) shown to inhibit the expression of RAD21and subsequently inhibit cell proliferation was used as a positivecontrol in all of the assays described herein.

TABLE 36 siRNA SEQ ID siRNA Target Sequence Specificity NO: ICAM1_293ICAM1 ACAACCGGAAGGUGUAUGA ICAM1_335 ICAM1 GCCAACCAAUGUGCUAUUC ICAM1_604ICAM1 GAUCACCAUGGAGCCAAUU ICAM1_1409 ICAM1 CUGUCACUCGAGAUCUUGAsiRNA_RAD21_1175 RAD21 GAGUUGGAUAGCAAGACAA positive control siGL2negative GL2 CGUACGCGGAAUACUUCGA control

The siRNAs were transfected in cell lines MCF-7 and T-47D usingLipofectamine™ 2000 reagent from Invitrogen, Corp. 2.5 μg or 5.0 μg ofsiRNA was mixed with 6.25 μl or 12.5 μl lipofectamine, respectively, andthe mixture was added to cells grown in 6-well plates. Their inhibitoryeffects on ICAM1 gene expression were confirmed by precision expressionanalysis by MassARRAY (quantitativeRT-PCR hME), which was performed onRNA prepared from the transfected cells. See Chunming & Cantor, PNAS100(6):3059-3064 (2003). Cell viability was measured at 1, 2, 4 and 6days post-transfection. Absorbance values were normalized relative toDay 1. RNA was extracted with Trizole reagent as recommended by themanufacturer (Invitrogen, Corp.) followed by cDNA synthesis usingSuperScript™ reverse transcriptase.

A cocktail of siRNA molecules described in Table 28 (that target ICAM1)strongly inhibited proliferation of breast cancer cell line (MCF-7), asshown in in FIG. 22. These effects are consistent in all six experimentsperformed. Each data point is an average of 3 wells of a 96-well platenormalized to values obtained from day 1 post transfection. Thespecificity of the active siRNAs, was confirmed with a negative,non-homologous control siRNA (siGL2), and a positive control, siRNARAD21_(—)1175, that targets a known cancer-associated gene, RAD21.

Example 11 In Vitro Production of Target Polypeptides

cDNA is cloned into a pIVEX 2.3-MCS vector (Roche Biochem) using adirectional cloning method. A cDNA insert is prepared using PCR withforward and reverse primers having 5′ restriction site tags (in frame)and 5-6 additional nucleotides in addition to 3′ gene-specific portions,the latter of which is typically about twenty to about twenty-five basepairs in length. A Sal I restriction site is introduced by the forwardprimer and a Sma I restriction site is introduced by the reverse primer.The ends of PCR products are cut with the corresponding restrictionenzymes (i.e., Sal I and Sma I) and the products are gel-purified. ThepIVEX 2.3-MCS vector is linearized using the same restriction enzymes,and the fragment with the correct sized fragment is isolated bygel-purification. Purified PCR product is ligated into the linearizedpIVEX 2.3-MCS vector and E. coli cells transformed for plasmidamplification. The newly constructed expression vector is verified byrestriction mapping and used for protein production.

E. coli lysate is reconstituted with 0.25 ml of Reconstitution Buffer,the Reaction Mix is reconstituted with 0.8 ml of Reconstitution Buffer;the Feeding Mix is reconstituted with 10.5 ml of Reconstitution Buffer;and the Energy Mix is reconstituted with 0.6 ml of ReconstitutionBuffer. 0.5 ml of the Energy Mix was added to the Feeding Mix to obtainthe Feeding Solution. 0.75 ml of Reaction Mix, 50 μl of Energy Mix, and10 μg of the template DNA is added to the E. coli lysate.

Using the reaction device (Roche Biochem), 1 ml of the Reaction Solutionis loaded into the reaction compartment. The reaction device is turnedupside-down and 10 ml of the Feeding Solution is loaded into the feedingcompartment. All lids are closed and the reaction device is loaded intothe RTS500 instrument. The instrument is run at 30° C. for 24 hours witha stir bar speed of 150 rpm. The pIVEX 2.3 MCS vector includes anucleotide sequence that encodes six consecutive histidine amino acidson the C-terminal end of the target polypeptide for the purpose ofprotein purification. Target polypeptide is purified by contacting thecontents of reaction device with resin modified with Ni²⁺ ions. Targetpolypeptide is eluted from the resin with a solution containing freeNi²⁺ ions.

Example 12 Cellular Production of Target Polypeptides

Nucleic acids are cloned into DNA plasmids having phage recombinationcites and target polypeptides are expressed therefrom in a variety ofhost cells. Alpha phage genomic DNA contains short sequences known asattP sites, and E. coli genomic DNA contains unique, short sequencesknown as attB sites. These regions share homology, allowing forintegration of phage DNA into E. coli via directional, site-specificrecombination using the phage protein Int and the E. coli protein IHF.Integration produces two new att sites, L and R, which flank theinserted prophage DNA. Phage excision from E. coli genomic DNA can alsobe accomplished using these two proteins with the addition of a secondphage protein, Xis. DNA vectors have been produced where theintegration/excision process is modified to allow for the directionalintegration or excision of a target DNA fragment into a backbone vectorin a rapid in vitro reaction (Gateway™ Technology (Invitrogen, Inc.)).

A first step is to transfer the nucleic acid insert into a shuttlevector that contains attL sites surrounding the negative selection gene,ccdb (e.g. pENTER vector, Invitrogen, Inc.). This transfer process isaccomplished by digesting the nucleic acid from a DNA vector used forsequencing, and to ligate it into the multicloning site of the shuttlevector, which will place it between the two attL sites while removingthe negative selection gene ccdB. A second method is to amplify thenucleic acid by the polymerase chain reaction (PCR) with primerscontaining attb sites. The amplified fragment then is integrated intothe shuttle vector using Int and IHF. A third method is to utilize atopoisomerase-mediated process, in which the nucleic acid is amplifiedvia PCR using gene-specific primers with the 5′ upstream primercontaining an additional CACC sequence (e.g., TOPO® expression kit(Invitrogen, Inc.)). In conjunction with Topoisomerase I, the PCRamplified fragment can be cloned into the shuttle vector via the attLsites in the correct orientation.

Once the nucleic acid is transferred into the shuttle vector, it can becloned into an expression vector having attR sites. Several vectorscontaining attR sites for expression of target polypeptide as a nativepolypeptide, N-fusion polypeptide, and C-fusion polypeptides arecommercially available (e.g., pDEST (Invitrogen, Inc.)), and any vectorcan be converted into an expression vector for receiving a nucleic acidfrom the shuttle vector by introducing an insert having an attR siteflanked by an antibiotic resistant gene for selection using the standardmethods described above. Transfer of the nucleic acid from the shuttlevector is accomplished by directional recombination using Int, IHF, andXis (LR clonase). Then the desired sequence can be transferred to anexpression vector by carrying out a one hour incubation at roomtemperature with Int, IHF, and Xis, a ten minute incubation at 37° C.with proteinase K, transforming bacteria and allowing expression for onehour, and then plating on selective media. Generally, 90% cloningefficiency is achieved by this method. Examples of expression vectorsare pDEST 14 bacterial expression vector with att7 promoter, pDEST 15bacterial expression vector with a T7 promoter and a N-terminal GST tag,pDEST 17 bacterial vector with a T7 promoter and a N-terminalpolyhistidine affinity tag, and pDEST 12.2 mammalian expression vectorwith a CMV promoter and neo resistance gene. These expression vectors orothers like them are transformed or transfected into cells forexpression of the target polypeptide or polypeptide variants. Theseexpression vectors are often transfected, for example, intomurine-transformed a adipocyte cell line 3T3-L1, (ATCC), human embryonickidney cell line 293, and rat cardiomyocyte cell line H9C2.

Example 13 Haplotype Analysis of the KIAA0861 Locus

rs6804951 and rs2001449 are significant at the allele and genotypelevels (P<0.05). Moderate LD is observed for markers rs3732602 andrs2293203 (r̂2=0.646). Chi-squared tests indicate that haplotypes aresignificantly associated with breast cancer. Cell-specific chi-squarevalues indicate that TTTTG and CTTTC haplotypes are contributors to thisrelationship. Odds ratios and score tests indicate that individualscarrying the TTTG are less likely to have breast cancer, whileindividuals with CTTTC are at elevated risk for the disease. Moreover,the odds ratio estimated for the CGTTC indicates more than a two-foldrisk of disease among its carriers, although this result must beinterpreted with great caution due to the low observed frequency in thepopulation.

A. Summary Statistics of Alleles and Genotypes 1. SNP Locations

SNP. ID Type Location rs6804951 Proximal 184327431 rs7639705 Proximal184330963 rs3732602 Proximal 184408945 rs2293203 Proximal 184419992rs2001449 Incident 184429569

2. Allele by GYNGroup

Case Control Test N (N = 544) (N = 552) Statistic rs6804951: T 1064 5%(24) 9% (46) Chi-square = 6.71 d.f. = 1 P = 0.00958 rs7639705: T 108680% (434) 81% (441) Chi-square = 0.03 d.f. = 1 P = 0.868 rs3732602: T1074 99% (532) 99% (532) Chi-square = 0.4 d.f. = 1 P = 0.529 rs2293203:T 1088 99% (536) 99% (538) Chi-square = 0.27 d.f. = 1 P = 0.6 rs2001449:C 1084 30% (161) 22% (119) Chi-square = 8.49 d.f. = 1 P = 0.00356

3. Genotype by GYNGroup

Case Control Test N (N = 272) (N = 276) Statistic rs6804951: CC 532 91%(238) 83% (225) Chi-square = 7.13 d.f. = 2 P = 0.0283 CT 9% (24) 16%(44)  TT 0% (0)  0% (1)  rs7639705: GG 543 3% (9)  5% (14) Chi-square =2.03 d.f. = 2 P = 0.362 GT 33% (88)  28% (77)  TT 64% (173) 67% (182)rs3732602: TT 537 99% (264) 98% (263) Chi-square = 0.4 d.f. = 1 P =0.527 rs2293203: TT 544 98% (265) 97% (265) Chi-square = 0.28 d.f. = 1 P= 0.598 rs2001449: GG 542 47% (128) 60% (162) Chi-square = 9.29 d.f. = 2P = 0.00961 GC 46% (125) 37% (99)  CC 7% (18) 4% (10)

4. Genotype QC: Test of Hardy-Weinberg Equilibrium

a. Cases

A. freq D ChiSq Pvalue rs6804951 0.936 −0.002280 0.7870 0.3750 rs76397050.807 0.004790 0.5150 0.4730 rs3732602 0.990 −0.000101 0.0565 0.8120rs2293203 0.987 −0.000164 0.0921 0.7620 rs2001449 0.744 −0.014500 3.14000.0763b. Controls

A. freq D ChiSq Pvalue rs6804951 0.916 −0.003400 0.5350 0.465 rs76397050.808 0.014400 2.3600 0.124 rs3732602 0.989 −0.000120 0.0336 0.855rs2293203 0.985 −0.000213 0.0601 0.806 rs2001449 0.783 −0.010700 1.08000.299

B. Summary Statistics: Linkage Disequilibrium 1. Phase HaplotypeFrequencies

H.freq H.relfreq CGTTC 13 0.012 CGTTG 191 0.175 CTCAG 10 0.009 CTCTG 10.001 CTTAG 4 0.004 CTTTC 265 0.243 CTTTG 538 0.493 TGTTG 7 0.006 TTTTC2 0.002 TTTTG 61 0.056

2. Linkage Disequilibrium Between Markers

a. r̂2

rs6804951 rs7639705 rs3732602 rs2293203 rs2001449 rs6804951 1.0000000.00382 0.000697 0.00089 0.01860 rs7639705 0.003820 1.00000 0.0024400.00311 0.04770 rs3732602 0.000697 0.00244 1.000000 0.64600 0.00351rs2293203 0.000890 0.00311 0.646000 1.00000 0.00448 rs2001449 0.0186000.04770 0.003510 0.00448 1.00000b. D′

rs6804951 rs7639705 rs3732602 rs2293203 rs2001449 rs6804951 1.0000 0.1160.0685 0.0685 0.306 rs7639705 0.1160 1.000 0.2400 0.2400 0.262 rs37326020.0685 0.240 1.0000 0.9080 0.345 rs2293203 0.0685 0.240 0.9080 1.00000.345 rs2001449 0.3060 0.262 0.3450 0.3450 1.000c. P-value

rs6804951 rs7639705 rs3732602 rs2293203 rs2001449 rs6804951 1.00e+004.12e−02 0.3830 0.3240 6.40e−06 rs7639705 4.12e−02 1.00e+00 0.10300.0653 5.41e−13 rs3732602 3.83e−01 1.03e−01 1.0000 0.0000 5.03e−02rs2293203 3.24e−01 6.53e−02 0.0000 1.0000 2.70e−02 rs2001449 6.40e−065.41e−13 0.0503 0.0270 1.00e+00

3. Haplotype by GYNGroup

a. Phase Haplotypes (All)

Case Case(%) Case.X{circumflex over ( )}2 Control Control(%)Control.X{circumflex over ( )}2 OR ln.OR TTTTG 20 1.83 3.55 41 3.75 3.530.4782 −0.7377 CTCAG 4 0.37 0.19 6 0.55 0.19 0.6654 −0.4074 TGTTG 3 0.270.07 4 0.37 0.07 0.7493 −0.2886 CTTTG 259 23.72 0.30 279 25.55 0.300.9060 −0.0987 CGTTG 94 8.61 0.01 97 8.88 0.01 0.9662 −0.0344 CTTAG 20.18 0.00 2 0.18 0.00 1.0000 0.0000 TTTTC 1 0.09 0.00 1 0.09 0.00 1.00000.0000 CTTTC 151 13.83 2.73 114 10.44 2.71 1.3766 0.3196 CGTTC 9 0.820.98 4 0.37 0.98 2.2604 0.8155 CTCTG 1 0.09 0.51 0 0.00 0.50 Inf InfPearson Chi-squared Test = 16.6377, DF = 9, P-value = 0.0547b. Phase Haplotypes (Low Frequency Removed)

Case Case(%) Case.X{circumflex over ( )}2 Control Control(%)Control.X{circumflex over ( )}2 OR ln.OR TTTTG 20 1.86 3.55 41 3.80 3.520.4781 −0.7379 CTCAG 4 0.37 0.19 6 0.56 0.19 0.6654 −0.4074 CTTTG 25924.03 0.30 279 25.88 0.30 0.9056 −0.0992 CGTTG 94 8.72 0.01 97 9.00 0.010.9661 −0.0345 CTTTC 151 14.01 2.73 114 10.58 2.71 1.3774 0.3202 CGTTC 90.83 0.98 4 0.37 0.98 2.2605 0.8156 Pearson Chi-squared Test = 15.4946,DF = 5, P-value = 0.008445c. Haplo.score Haplotypes

Hap.Freq Score P. X{circumflex over ( )}2 P.Sim TTTTG 0.0529 −2.12060.0340 0.0342 TGTTG 0.0101 −2.0668 0.0388 0.0236 CTCAG 0.0073 −1.29140.1966 0.2902 CTTTG 0.5221 −1.2275 0.2196 0.2195 CGTTG 0.1448 −0.14410.8854 0.8834 CTTTC 0.2267 2.3422 0.0192 0.0192 CGTTC 0.0307 2.69940.0069 0.0050 Global Score = 20.343, DF = 7, Global P.X{circumflex over( )}2 = 0.0049, Global P.Sim = 0.0022

Example 14 Haplotype Analysis of the NUMA1 Locus

All markers noted below except 2276396 are associated with breast cancerat the allele level (P<0.05). Marker 675185 does not maintain thisrelationship at the genotype level. Strong LD is observed across theentire region but is particular strong between and among 1894003,675185, 673478, and 615000. Pearson chi-squared statistics suggest thathaplotypes are significantly associated with breast cancer. HaplotypeTTCTC contributes the most to this relationship. Odds ratios and scorestatistics indicate that individuals with haplotype TTCTC are 2.6 timesmore likely to have breast cancer than individuals with otherhaplotypes.

Statistics

Chi-squared statistics are estimated to assess whether 1) alleles andgenotypes are associated with breast cancer status and 2) markergenotype frequencies deviate significantly from Hardy-Weinbergequilibrium (HWE). Haplotype frequencies and relative frequencies areestimated, as well as several statistics (r², D′, and p-value) thatgauge the extent and stability of linkage disequilibrium between markersin each region. Chi-squared statistics and score tests are estimated todetermine whether reconstructed haplotypes are significantly associatedwith breast cancer status (P<0.05). P-values are estimated for 1) thefull set of reconstructed haplotypes and 2) a reduced set that excludeshaplotypes with observed frequencies less than 10. Results are presentedby chromosome order.

Results

Summary Statistics: Alleles and Genotypes

SNP Locations SNP. ID Type Location 1894003 Proximal 71972974 675185Proximal 71998270 673478 Incident 72021802 615000 Proximal 720256502276396 Proximal 72046603

Allele by GYNGroup Case Control Test N (N = 510) (N-538) Statistic1894003:C 1026 91% (450) 96% (510) Chi-square = 6.95 d.f. = 1 P =0.00838 675185:G 1010 92% (451) 95% (498) Chi-square = 3.96 d.f. = 1 P =0.0466 673478:C 1022 8% (41) 5% (25) Chi-square = 5.68 d.f. = 1 P =0.0171 615000:G 1010 92% (434) 96% (513) Chi-square = 7.4 d.f. = 1 P =0.00652 2276396:C 1028 97% (478) 98% (523) Chi-square = 0.18 d.f. = 1 P= 0.674

Genotype by GYNGroup Case Control Test N (N = 255) (N = 269) Statistic1894003:TT 513 1% (3) 0% (0) Chi-square = 7.43 d.f. = 2 P = 0.0243 TC15% (36)  9% (24) CC  84% (207)  91% (243) 675185:TT 505 0% (1) 0% (0)Chi-square = 4.37 d.f. = 2 P = 0.112 TG 14% (35)  9% (24) GG  85% (208) 91% (237) 673478:TT 511  84% (207)  91% (241) Chi-square = 6.39 d.f. =2 P = 0.0409 TC 14% (35)  9% (25) CC 1% (3) 0% (0) 615000:TT 505 1% (3)0% (0) Chi-square = 7.8 d.f. = 2 P = 0.0202 TG 14% (34)  9% (23) GG  84%(200)  91% (245) 2276396:CC 514  4% (232)  95% (255) Chi-square = 0.18d.f. = 1 P = 0.67

Genotype QC: Test of Hardy-Weinberg Proportions

All A. freq D ChiSq Pvalue 1894003 0.935 0.00159 0.350 0.554 6751850.935 0.00159 0.350 0.554 673478 0.935 0.00159 0.350 0.554 615000 0.9370.00184 0.495 0.482 2276396 0.974 −0.00069 0.374 0.541

Control A. freq D ChiSq Pvalue 1894003 0.953 −0.002190 0.644 0.422675185 0.953 −0.002190 0.644 0.422 673478 0.953 −0.002190 0.644 0.422615000 0.957 −0.001860 0.541 0.462 2276396 0.976 −0.000593 0.166 0.683

Summary Statistics: Linkage Disequilibrium

Haplotype Frequencies H.freq H.relfreq CGTGC 961 0.935 TTCGC 1 0.001TTCGG 1 0.001 TTCTC 39 0.038 TTCTG 26 0.025

Linkage Disequilibrium Between Markers

r² 1894003 675185 GP11.079035103 615000 2276396 1894003 1.000 1.0001.000 0.968 0.387 675185 1.000 1.000 1.000 0.968 0.387 673478 1.0001.000 1.000 0.968 0.387 615000 0.968 0.968 0.968 1.000 0.369 22763960.387 0.387 0.387 0.369 1.000

D′ 1894003 675185 GP11.079035103 615000 2276396 1894003 1 1 1 1.00 1.00675185 1 1 1 1.00 1.00 673478 1 1 1 1.00 1.00 615000 1 1 1 1.00 0.962276396 1 1 1 0.96 1.00

P-value X 1894003 675185 GP11.079035103 615000 2276396 1894003 1 0 0 0 0675185 0 1 0 0 0 GP11.079035103 0 0 1 0 0 615000 0 0 0 1 0 2276396 0 0 00 1

Haplotype by GYNGroup

PHASE Haplotypes (All) Case Case(%) Case.X{circumflex over ( )}2 ControlControl(%) Control.X{circumflex over ( )}2 OR ln.OR TTCGC 0 0.00 0.48 10.10 0.44 0.0000 −Inf TTCGG 0 0.00 0.48 1 0.10 0.44 0.0000 −Inf CGTGC452 43.97 0.21 509 49.51 0.19 0.8001 −0.2230 TTCTG 14 1.36 0.18 12 1.170.17 1.1690 0.1561 TTCTC 28 2.72 4.57 11 1.07 4.23 2.5887 0.9512 PearsonChi-squared Test = 11.4058, DF = 4, P-value = 0.02236 Permutation TestP-value = 0.14

PHASE Haplotypes (Low Frequency Excluded) Case Case(%) Case.X{circumflexover ( )}2 Control Control(%) Control.X{circumflex over ( )}2 OR ln.ORCGTGC 452 44.05 0.25 509 49.61 0.23 0.7998 −0.2234 TTCTG 14 1.36 0.18 121.17 0.16 1.1690 0.1561 TTCTC 28 2.73 4.53 11 1.07 4.21 2.5888 0.9512Pearson Chi-squared Test = 9.5506, DF = 2, P-value = 0.008435

haplo.score Haplotypes Hap.Freq Score P.X{circumflex over ( )}2 P.SimCGTGC 0.9410 −2.0316 0.0422 0.0531 TTCTG 0.0248 0.3232 0.7465 0.8344TTCTC 0.0321 2.6973 0.0070 0.0093 Global Score = 9.1386, DF = 3, GlobalP.X{circumflex over ( )}2 = 0.0275, Global P.Sim = 0.0212

Modifications may be made to the foregoing without departing from thebasic aspects of the invention. Although the invention has beendescribed in substantial detail with reference to one or more specificembodiments, those of skill in the art will recognize that changes maybe made to the embodiments specifically disclosed in this application,yet these modifications and improvements are within the scope and spiritof the invention, as set forth in the claims which follow. Allpublications or patent documents cited in this specification areincorporated herein by reference as if each such publication or documentwas specifically and individually indicated to be incorporated herein byreference.

Citation of the above publications or documents is not intended as anadmission that any of the foregoing is pertinent prior art, nor does itconstitute any admission as to the contents or date of thesepublications or documents. U.S. patents, documents and otherpublications referenced herein are hereby incorporated by reference.

1. (canceled)
 2. A method for determining whether a human subject is atan increased risk or decreased risk of breast cancer, which comprises:(a) detecting in a nucleic acid of the human subject the presence of apolymorphic variant selected from the group consisting of an adeninecorresponding to position 11963 of SEQ ID NO: 1, a guanine correspondingto position 36340 of SEQ ID NO: 1, an adenine corresponding to position36992 of SEQ ID NO: 1, a guanine corresponding to position 37868 of SEQID NO: 1, a cytosine corresponding to position 41213 of SEQ ID NO: 1, aguanine corresponding to position 41419 of SEQ ID NO: 1, a cytosinecorresponding to position 42407 of SEQ ID NO: 1, a cytosinecorresponding to position 44247 of SEQ ID NO: 1, a guanine correspondingto position 44677 of SEQ ID NO: 1, a thymine corresponding to position45256 of SEQ ID NO: 1, a cytosine corresponding to position 51102 of SEQID NO: 1 and a guanine corresponding to position 72360 of SEQ ID NO: 1,and a complement of the foregoing; or (b) detecting in a nucleic acid ofthe human subject the presence of a polymorphic variant selected fromthe group consisting of a guanine corresponding to position 11963 of SEQID NO: 1, an adenine corresponding to position 36340 of SEQ ID NO: 1, aguanine corresponding to position 36992 of SEQ ID NO: 1, an adeninecorresponding to position 37868 of SEQ ID NO: 1, a thymine correspondingto position 41213 of SEQ ID NO: 1, a cytosine corresponding to position41419 of SEQ ID NO: 1, a guanine corresponding to position 42407 of SEQID NO: 1, a thymine corresponding to position 44247 of SEQ ID NO: 1, anadenine corresponding to position 44677 of SEQ ID NO: 1, a cytosinecorresponding to position 45256 of SEQ ID NO: 1, a thymine correspondingto position 51102 of SEQ ID NO: 1 and an adenine corresponding toposition 72360 of SEQ ID NO: 1, and a complement of the foregoing;whereby it is determined that the subject is at an increased risk ofbreast cancer based on the presence of one or more of the polymorphicvariants of (a), and whereby it is determined that the subject is at adecreased risk of breast cancer based on the presence of one or more ofthe polymorphic variations of (b).
 3. The method of claim 2, whichfurther comprises obtaining the nucleic acid sample from the subject. 4.The method of claim 2, wherein detecting the presence of the one or morepolymorphic variants comprises: hybridizing an oligonucleotide to thenucleic acid from the subject, wherein the oligonucleotide iscomplementary to a nucleotide sequence in the nucleic acid andhybridizes to a region adjacent to the polymorphic variant; extendingthe oligonucleotide in the presence of one or more nucleotides, yieldingextension products; and detecting the presence a polymorphic variant inthe extension products.
 5. The method of claim 2, wherein thepolymorphic variant detected is an adenine corresponding to position11963 of SEQ ID NO: 1, or a complement thereof.
 6. The method of claim2, wherein the polymorphic variant detected is a guanine correspondingto position 36340 of SEQ ID NO: 1, or a complement thereof.
 7. Themethod of claim 2, wherein the polymorphic variant detected is anadenine corresponding to position 36992 of SEQ ID NO: 1, or a complementthereof.
 8. The method of claim 2, wherein the polymorphic variantdetected is a guanine corresponding to position 37868 of SEQ ID NO: 1,or a complement thereof.
 9. The method of claim 2, wherein thepolymorphic variant detected is a cytosine corresponding to position41213 of SEQ ID NO: 1, or a complement thereof.
 10. The method of claim2, wherein the polymorphic variant detected is a guanine correspondingto position 41419 of SEQ ID NO: 1, or a complement thereof.
 11. Themethod of claim 2, wherein the polymorphic variant detected is acytosine corresponding to position 42407 of SEQ ID NO: 1, or acomplement thereof.
 12. The method of claim 2, wherein the polymorphicvariant detected is a thymine corresponding to position 44247 of SEQ IDNO: 1, or a complement thereof.
 13. The method of claim 2, wherein thepolymorphic variant detected is a guanine corresponding to position44677 of SEQ ID NO: 1, or a complement thereof.
 14. The method of claim2, wherein the polymorphic variant detected is a thymine correspondingto position 45256 of SEQ ID NO: 1, or a complement thereof.
 15. Themethod of claim 2, wherein the polymorphic variant detected is acytosine corresponding to position 51102 of SEQ ID NO: 1, or acomplement thereof.
 16. The method of claim 2, wherein the polymorphicvariant detected is a guanine corresponding to position 72360 of SEQ IDNO: 1, or a complement thereof.
 17. The method of claim 2, wherein thepolymorphic variant detected is an adenine corresponding to position7573 of SEQ ID NO: 2, or a complement thereof.
 18. The method of claim2, wherein the polymorphic variant detected is a guanine correspondingto position 11963 of SEQ ID NO: 1, or a complement thereof.
 19. Themethod of claim 2, wherein the polymorphic variant detected is anadenine corresponding to position 36340 of SEQ ID NO: 1, or a complementthereof.
 20. The method of claim 2, wherein the polymorphic variantdetected is a guanine corresponding to position 36992 of SEQ ID NO: 1,or a complement thereof.
 21. The method of claim 2, wherein thepolymorphic variant detected is an adenine corresponding to position37868 of SEQ ID NO: 1, or a complement thereof.
 22. The method of claim2, wherein the polymorphic variant detected is a thymine correspondingto position 41213 of SEQ ID NO: 1, or a complement thereof.
 23. Themethod of claim 2, wherein the polymorphic variant detected is acytosine corresponding to position 41419 of SEQ ID NO: 1, or acomplement thereof.
 24. The method of claim 2, wherein the polymorphicvariant detected is a guanine corresponding to position 42407 of SEQ IDNO: 1, or a complement thereof.
 25. The method of claim 2, wherein thepolymorphic variant detected is a thymine corresponding to position44247 of SEQ ID NO: 1, or a complement thereof.
 26. The method of claim2, wherein the polymorphic variant detected is an adenine correspondingto position 44677 of SEQ ID NO: 1, or a complement thereof.
 27. Themethod of claim 2, wherein the polymorphic variant detected is acytosine corresponding to position 45256 of SEQ ID NO: 1, or acomplement thereof.
 28. The method of claim 2, wherein the polymorphicvariant detected is a thymine corresponding to position 51102 of SEQ IDNO: 1, or a complement thereof.
 29. The method of claim 2, wherein thepolymorphic variant detected is an adenine corresponding to position72360 of SEQ ID NO: 1, or a complement thereof.
 30. The method of claim2, wherein the human subject is Caucasian.
 31. A method for determiningwhether a breast cancer detection procedure is administered to a humansubject, which comprises: (a) detecting in a nucleic acid of the humansubject the presence of a polymorphic variant selected from the groupconsisting of an adenine corresponding to position 11963 of SEQ ID NO:1, a guanine corresponding to position 36340 of SEQ ID NO: 1, an adeninecorresponding to position 36992 of SEQ ID NO: 1, a guanine correspondingto position 37868 of SEQ ID NO: 1, a cytosine corresponding to position41213 of SEQ ID NO: 1, a guanine corresponding to position 41419 of SEQID NO: 1, a cytosine corresponding to position 42407 of SEQ ID NO: 1, acytosine corresponding to position 44247 of SEQ ID NO: 1, a guaninecorresponding to position 44677 of SEQ ID NO: 1, a thymine correspondingto position 45256 of SEQ ID NO: 1, a cytosine corresponding to position51102 of SEQ ID NO: 1 and a guanine corresponding to position 72360 ofSEQ ID NO: 1, and a complement of the foregoing; or (b) detecting in anucleic acid of the human subject the presence of a polymorphic variantselected from the group consisting of a guanine corresponding toposition 11963 of SEQ ID NO: 1, an adenine corresponding to position36340 of SEQ ID NO: 1, a guanine corresponding to position 36992 of SEQID NO: 1, an adenine corresponding to position 37868 of SEQ ID NO: 1, athymine corresponding to position 41213 of SEQ ID NO: 1, a cytosinecorresponding to position 41419 of SEQ ID NO: 1, a guanine correspondingto position 42407 of SEQ ID NO: 1, a thymine corresponding to position44247 of SEQ ID NO: 1, an adenine corresponding to position 44677 of SEQID NO: 1, a cytosine corresponding to position 45256 of SEQ ID NO: 1, athymine corresponding to position 51102 of SEQ ID NO: 1 and an adeninecorresponding to position 72360 of SEQ ID NO: 1, and a complement of theforegoing; and (c) administering a breast cancer detection procedure toa human subject determined to have an increased risk of breast cancerbased on the presence of the one or more polymorphic variants of (a), ornot administering a breast cancer detection procedure to a human subjectdetermined to have a decreased risk of breast cancer based on thepresence of the one or more polymorphic variants of (b).
 32. The methodof claim 31, which further comprises obtaining the nucleic acid samplefrom the subject.
 33. The method of claim 31, wherein detecting thepresence of the one or more polymorphic variants comprises: hybridizingan oligonucleotide to the nucleic acid from the subject, wherein theoligonucleotide is complementary to a nucleotide sequence in the nucleicacid and hybridizes to a region adjacent to the polymorphic variant;extending the oligonucleotide in the presence of one or morenucleotides, yielding extension products; and detecting the presence apolymorphic variant in the extension products.
 34. The method of claim31, wherein the breast cancer detection procedure is selected from thegroup consisting of a mammography, an early mammography program, afrequent mammography program, a biopsy procedure, a breast biopsy andbiopsy from another tissue, a breast ultrasound and optionallyultrasound analysis of another tissue, breast magnetic resonance imaging(MRI) and optionally MRI analysis of another tissue, electricalimpedance (T-scan) analysis of breast and optionally of another tissue,ductal lavage, nuclear medicine analysis, scintimammography, BRCA1and/or BRCA2 sequence analysis results, thermal imaging of the breastand optionally of another tissue, and a combination of the foregoing.35. The method of claim 31, wherein the polymorphic variant detected isan adenine corresponding to position 11963 of SEQ ID NO: 1, or acomplement thereof.
 36. The method of claim 31, wherein the polymorphicvariant detected is a guanine corresponding to position 36340 of SEQ IDNO: 1, or a complement thereof.
 37. The method of claim 31, wherein thepolymorphic variant detected is an adenine corresponding to position36992 of SEQ ID NO: 1, or a complement thereof.
 38. The method of claim31, wherein the polymorphic variant detected is a guanine correspondingto position 37868 of SEQ ID NO: 1, or a complement thereof.
 39. Themethod of claim 31, wherein the polymorphic variant detected is acytosine corresponding to position 41213 of SEQ ID NO: 1, or acomplement thereof.
 40. The method of claim 31, wherein the polymorphicvariant detected is a guanine corresponding to position 41419 of SEQ IDNO: 1, or a complement thereof.
 41. The method of claim 31, wherein thepolymorphic variant detected is a cytosine corresponding to position42407 of SEQ ID NO: 1, or a complement thereof.
 42. The method of claim31, wherein the polymorphic variant detected is a thymine correspondingto position 44247 of SEQ ID NO: 1, or a complement thereof.
 43. Themethod of claim 31, wherein the polymorphic variant detected is aguanine corresponding to position 44677 of SEQ ID NO: 1, or a complementthereof.
 44. The method of claim 31, wherein the polymorphic variantdetected is a thymine corresponding to position 45256 of SEQ ID NO: 1,or a complement thereof.
 45. The method of claim 31, wherein thepolymorphic variant detected is a cytosine corresponding to position51102 of SEQ ID NO: 1, or a complement thereof.
 46. The method of claim31, wherein the polymorphic variant detected is a guanine correspondingto position 72360 of SEQ ID NO: 1, or a complement thereof.
 47. Themethod of claim 31, wherein the polymorphic variant detected is anadenine corresponding to position 7573 of SEQ ID NO: 2, or a complementthereof.
 48. The method of claim 31, wherein the polymorphic variantdetected is a guanine corresponding to position 11963 of SEQ ID NO: 1,or a complement thereof.
 49. The method of claim 31, wherein thepolymorphic variant detected is an adenine corresponding to position36340 of SEQ ID NO: 1, or a complement thereof.
 50. The method of claim31, wherein the polymorphic variant detected is a guanine correspondingto position 36992 of SEQ ID NO: 1, or a complement thereof.
 51. Themethod of claim 31, wherein the polymorphic variant detected is anadenine corresponding to position 37868 of SEQ ID NO: 1, or a complementthereof.
 52. The method of claim 31, wherein the polymorphic variantdetected is a thymine corresponding to position 41213 of SEQ ID NO: 1,or a complement thereof.
 53. The method of claim 31, wherein thepolymorphic variant detected is a cytosine corresponding to position41419 of SEQ ID NO: 1, or a complement thereof.
 54. The method of claim31, wherein the polymorphic variant detected is a guanine correspondingto position 42407 of SEQ ID NO: 1, or a complement thereof.
 55. Themethod of claim 31, wherein the polymorphic variant detected is athymine corresponding to position 44247 of SEQ ID NO: 1, or a complementthereof.
 56. The method of claim 31, wherein the polymorphic variantdetected is an adenine corresponding to position 44677 of SEQ ID NO: 1,or a complement thereof.
 57. The method of claim 31, wherein thepolymorphic variant detected is a cytosine corresponding to position45256 of SEQ ID NO: 1, or a complement thereof.
 58. The method of claim31, wherein the polymorphic variant detected is a thymine correspondingto position 51102 of SEQ ID NO: 1, or a complement thereof.
 59. Themethod of claim 31, wherein the polymorphic variant detected is anadenine corresponding to position 72360 of SEQ ID NO: 1, or a complementthereof.
 60. The method of claim 31, wherein the human subject isCaucasian.